-
Notifications
You must be signed in to change notification settings - Fork 17
Serialize Messages as Bytes #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including n in serialization: [...] we could encode n within the first four bytes of the serialization. [...] However, since we'd have to repeat this encoding at multiple layers (chilldkg, encpedpop, simplepedpop), it would result in redundant and less clean serialization.
I think that's a good reason for not going with this alternative.
Questions
Why does VSSCommitment.from_bytes_and_t explicitly require t? Can't we infer t by computing len(b) // 33 like we do in deserialize_recovery_data? If this explicit approach was intentional, should we consistently implement explicit parameters (from_bytes_and_t) for other similar types, even when t can be inferred (e.g., simplepedpop.ParticipantMsg.from_bytes)?
I think the current implementation of computing t
in simplepedpop.ParticipantMsg.from_bytes
from the length of the data itself is broken. That's because it's never checked that this t
is actually correct. The coordinator_step
function gets t
as an argument, but it's never checked that it actually matches t
in the participant messages. AFAICS this will lead to a crash in the computation of sum_coms_to_nonconst_terms
.
Similarly, if the t
computed in the CoordinatorMsg
's from_bytes
function is wrong, then the assertion assert len(sum_coms_to_nonconst_terms) == t - 1
fires, instead of raising a ValueError as for other parsing errors.
These kinds of bugs are a good argument for doing as much error handling in the parsing functions themselves. So, I'd suggest to "consistently implement explicit parameters" (such as t
) to the from_bytes function and make sure that the error is handled inside the function. Then we can, for example, also remove the assertion len(sum_coms_to_nonconst_terms) == t - 1
.
Currently, I've also implemented serialization/deserialization methods for DKGOutput and SessionParams. However, these types aren't actually exchanged between participants and the coordinator, so their serialization isn't used within the API inputs/outputs. Shall I remove these unused methods?
I'd remove them because:
- I'd expect applications will typically have their own way of sending and collecting SessionParams.
- DKGOutput is not a protocol message that is sent between spec-compliant participants.
python/chilldkg_ref/simplpedpop.py
Outdated
# Read com (33*t bytes) | ||
if len(rest) < 33 * t: | ||
raise ValueError | ||
com, rest = ( | ||
VSSCommitment.from_bytes_and_t(rest[: 33 * t], t), | ||
rest[33 * t :], | ||
) | ||
|
||
if len(rest) != 0: | ||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like both these ValueErrors are never raised because after the t
is computed via divmod we have
t == len(rest)/33 + remainder # (and t is an integer) by definition of divmod
t == len(rest)/33 # because we fail if remainder != 0
It seems like there are more cases of unnecessary conditions in other parsing functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! I modeled this function after the deserialize_recovery_data
implementation. By that same reasoning, the ValueErrors below that come after divmod won't be triggered either, correct?
bip-frost-dkg/python/chilldkg_ref/chilldkg.py
Lines 385 to 416 in 0f9e4b9
# Compute n | |
n, remainder = divmod(len(rest), (33 + 33 + 32 + 64)) | |
if remainder != 0: | |
raise ValueError | |
# Read hostpubkeys (33*n bytes) | |
if len(rest) < 33 * n: | |
raise ValueError | |
hostpubkeys, rest = [rest[i : i + 33] for i in range(0, 33 * n, 33)], rest[33 * n :] | |
# Read pubnonces (33*n bytes) | |
if len(rest) < 33 * n: | |
raise ValueError | |
pubnonces, rest = [rest[i : i + 33] for i in range(0, 33 * n, 33)], rest[33 * n :] | |
# Read enc_secshares (32*n bytes) | |
if len(rest) < 32 * n: | |
raise ValueError | |
enc_secshares, rest = ( | |
[Scalar.from_bytes_checked(rest[i : i + 32]) for i in range(0, 32 * n, 32)], | |
rest[32 * n :], | |
) | |
# Read cert | |
cert_len = certeq_cert_len(n) | |
if len(rest) < cert_len: | |
raise ValueError | |
cert, rest = rest[:cert_len], rest[cert_len:] | |
if len(rest) != 0: | |
raise ValueError | |
return (t, sum_coms, hostpubkeys, pubnonces, enc_secshares, cert) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, but I'd still keep it because it's easier to reason about, than when the ValueError is removed:
We know that after the check that remainder == 0
, the following holds in the integers (and n is positive):
n = len(rest)/(33 + 33 + 32 + 64)
So
(33 + 33 + 32 + 64)*n = len(rest)
Since (33 + 33 + 32 + 64)*n >= 33*n
, we have that len(rest) >= 33*n
.
python/chilldkg_ref/simplpedpop.py
Outdated
|
||
# Read Pop (64 bytes) | ||
if len(rest) < 64: | ||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should introduce a dedicated parsing error (can be a later PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to add this parsing error in a new commit under the same PR, since it needs to be considered when writing the test vectors
fa4e7c6 Introduce `GE.from_bytes_compressed_with_infinity` (siv2r) Pull request description: I'm working on [adding serialization/deserialization methods to ChillDKG](BlockstreamResearch/bip-frost-dkg#88) and need a helper function for deserializing coordinator messages. The function I need would map the 33 zero bytes to the infinity point. For context: BlockstreamResearch/bip-frost-dkg#88 (comment) I named the method `from_bytes_with_infinity` to match the existing `to_bytes_with_infinity` method in GE, but `from_zero_bytes` might be a more accurate name. ACKs for top commit: theStack: ACK fa4e7c6 real-or-random: utACK fa4e7c6 Tree-SHA512: 128d18a3868eec82c09655cbffb13f8f146e9fc328637cce648ef7a320eb01647e2491618bd506d8e20d57659b4ae0ec3e213b05fa93cb06c2c43830fcaf5510
The (de)ser methods were added for the following types: - ParticipantMsg1 - ParticipantMsg2 - CoordinatorMsg1 - CoordinatorMsg2 - CoordinatorInvestigationMsg These message types are now represented as bytes in user-facing APIs Executed `./all.sh` to update the README file with new msgs type
I’ve updated the deserialization functions to follow this style. The code won’t compile yet because it needs the |
I’ve been thinking about how to structure custom parsing errors, and came up with a few possible designs. I’m leaning toward Design 3 with Design 2 (using string arguments) under the hood. Would like to know your thoughts on this. Design 1: Layer-specific subclasses# util.py
class MessageParseError(ValueError):
"""Base class for message parsing errors."""
# simplpedpop.py
class ParticipantMessageError(MessageParseError):
pass
class CoordinatorMessageError(MessageParseError):
pass
class CoordinatorInvestigationMessageError(MessageParseError):
pass
# (same pattern repeated in encpedpop.py and chilldkg.py) This design fits the current style and makes type-specific catching easy. The downside is boilerplate: nine new classes across three layers. Design 2: Message type as an argument# util.py
from typing import Type
class MessageParseError(ValueError):
"""Base class for message parsing errors."""
def __init__(self, message_type: Type, msg: str):
self.message_type = message_type
super().__init__(msg)
# Usage
def from_bytes_and_t(b: bytes, t: int) -> ParticipantMsg:
if len(rest) < 64:
raise MessageParseError(
simplpedpop.ParticipantMsg,
)
# we can also do `raise MessageParseError("ParticipantMsg")` instead
# we don't need to worry about converting exception caught before propagating them to higher layers This design keeps the number of classes small. The downside is that it deviates from the current design pattern. Design 3: Unified base for all parsing errors# util.py
class ParseError(ValueError):
"""Base for all parsing errors in the protocol."""
class MessageParseError(ParseError):
"""Base for message parsing errors."""
# Could use Design 1 or 2 here
class RecoveryDataError(ParseError):
"""Failed to parse recovery data.""" This option creates a single umbrella for all parsing errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to also look top-down: what errors do we need to produce to get the user the information they need and allow us to test the API. For example in the participant functions of ChillDKG it's sufficient to return a generic message parsing error, because there's always just one message argument in the API that can fail to parse. On the other hand, the coordinator functions in chilldkg get multiple messages and we want to raise an error containing the index of the failing message. What design best facilitates this?
Ahh, I see. The requirement here is similar to how I'll implement this design in a new commit, and we can iterate from there. |
Also, remove removed redundant parsing in coordinator_investigate() fn in the chilldkg.py
I've updated the parsing code to raise |
Addresses #25
This pull request updates ChillDKG APIs to serialize messages between participants and the coordinator as
bytes
. Specifically, these API input/output types have been updated to usebytes
:ParticipantMsg1
ParticipantMsg2
CoordinatorMsg1
CoordinatorMsg2
CoordinatorInvestigationMsg
To support this, I've implemented
to_bytes()
andfrom_bytes()
(orfrom_bytes_and_n()
) methods for each named tuple.Additionally, I've introduced the(EDIT: created a PR for this function in upstream).GE.from_bytes_with_infinity()
methodAlternative Designs
Including
n
in serialization: Instead of usingfrom_bytes_and_n()
, we could encoden
within the first four bytes of the serialization. This approach would remove the explicitn
parameter. However, since we'd have to repeat this encoding at multiple layers (chilldkg
,encpedpop
,simplepedpop
), it would result in redundant and less clean serialization.Type Hinting: Currently, messages in API input/output are directly typed as
bytes
, which isn't ideal for type clarity. An alternative would be to create explicit aliases likeParticipantMsg1Bytes
andCoordinatorMsg1Bytes
, similar to what we have withRecoveryData
. Another option would be to rename the current structured types (e.g., toParticipantMsg1Internal
orParticipantMsg1Struct
) and use the existing names as byte aliases.Questions
Why does
VSSCommitment.from_bytes_and_t
explicitly requiret
? Can't we infert
by computinglen(b) // 33
like we do indeserialize_recovery_data
? If this explicit approach was intentional, should we consistently implement explicit parameters (from_bytes_and_t
) for other similar types, even whent
can be inferred (e.g.,simplepedpop.ParticipantMsg.from_bytes
)?Currently, I've also implemented serialization/deserialization methods for(EDIT: removed these (de)ser methods)DKGOutput
andSessionParams
. However, these types aren't actually exchanged between participants and the coordinator, so their serialization isn't used within the API inputs/outputs. Shall I remove these unused methods?