Skip to content

Conversation

@iurii-ssv
Copy link
Contributor

@iurii-ssv iurii-ssv commented Dec 24, 2025

Resolves #2632

This PR changes the validation-rules treatment of duplicate messages. A duplicate message is a message that should not be applied more than 1 time from QBFT-logic perspective (and it is "additionally" enforced via message-validation rules). SSV node wants to detect duplicate messages in order to:

  • reduce the amount of unnecessary traffic in p2p network as much as possible
  • actively punish peers (through libp2p reputation system) who don't follow protocol rules

The behavior this PR is aiming to change:

  • previously, a duplicate message will not be detected correctly if it comes from a different peer (a peer with different peerID) ... and so SSV node will accept such message and broadcast it to other peers getting punished for it
  • with this PR, a duplicate message is detected always, and then depending on whether it comes from the peer who already sent that/similar message to us, or from another/different peer - it is rejected/ignored respectively (punishing only those peers who "knowingly" broadcast duplicate messages, the "knowingly" part comes from the libp2p property of every node in p2p pubsub being fully responsible for validating messages received from other peers it chooses to re-broadcasts ... so it must validate the message isn't a duplicate before deciding to re-broadcast it)

In order to be able to tell whether a duplicate p2p message is coming from new/different peer (or from the same peer we've received that/similar message in the past) we need to store & update the message->peer mapping ... that is however somewhat expensive (and doesn't fit well with the current minimalistic/optimized implementation we have for message-validation). Instead, in this PR, we keep track of "seen validation-rule violations":

  • 1 violation is always allowed (if some peer has already sent some message, and then he sends the same/similar message 1 more time - he will not be punished by message rejection that 1 time, the receiving SSV node will simply ignore such a message for the 1st time)
  • 2nd+ violations are spotted via a check against "seen validation-rule violations" (the new structure we now track per peer, on a need-to basis), and if the violating message is coming from the peer who already had been observed to have this/similar kind of violation in the past (for this Operator+DutyType+slot) we reject the message (otherwise we ignore that violating message since it would be the 1st violation of this type for this peer)

Additional considerations:

  • previously messageValidator.state and messageValidator.validationLockCache maps were keyed by spectypes.MessageID+peerID ... that doesn't look correct to me because the "state" doesn't/shouldn't know/depend on which peer a message is received from
  • in this PR messageValidator.state and messageValidator.validationLockCache maps are keyed by spectypes.MessageID (which is basically = Operator+DutyType) only, meaning messages with the same spectypes.MessageID targeting different slots will acquire the same validation-lock and will be validated sequentially ... I think we could potentially optimize that further making validation for messages targeting different slots run concurrently, but it would be hard to implement this correctly on top of existing validation code (and this is out of scope of this PR anyway)
  • SignerState.SeenMsgTypes keeps track of both: messages related to QBFT-instance as well as pre-consensus phase messages, and upon round-change SignerState.Reset call fully resets not only QBFT-instance-related stuff but also pre-consensus phase messages as well. This means we are "allowing the receival of pre-consensus phase messages again once round-change happened" which doesn't make much sense. But it's not really abusable either, so there is no need to address that - just wanted to document that behavior
  • we also have a bunch of errors this PR doesn't touch that we classify as ignore (and never as reject), we might want to penalize peers sending lots of duplicate messages that end up as those errors as well as a potential future improvement

@iurii-ssv iurii-ssv requested review from a team as code owners December 24, 2025 17:24
@iurii-ssv iurii-ssv marked this pull request as draft December 24, 2025 17:24
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 24, 2025

Greptile Summary

Reworked duplicate message detection to be peer-independent while selectively punishing repeat offenders. Changed validation state keying from peerID+messageID to just messageID, ensuring duplicates are detected regardless of which peer sends them. Introduced IgnoreOrReject mechanism that ignores first-time violations (since detecting the original sender is expensive) but rejects subsequent violations from the same peer (easily detectable via SeenViolations tracking).

Key changes:

  • Validation lock and state now keyed by spectypes.MessageID only (not peerID+messageID)
  • New SignerState.SeenViolations map tracks which peers have sent which violation types
  • IgnoreOrReject method returns ignore error for first violation, reject error for repeats
  • Moved ErrDuplicatedMessage, ErrDifferentProposalData, ErrDecidedWithSameSigners, and ErrTooManyPartialSigMessage from reject to ignore category
  • Applied ignore-then-reject pattern to duplicate consensus messages, different proposal data, duplicate decided messages, and excessive partial signatures
  • Method renames for clarity: SignerOperatorState, GetSignerStateGetSignerStateForSlot

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured with clear logic for handling duplicate messages. The ignore-then-reject pattern correctly balances protection against malicious peers while avoiding false positives from network issues. Method renames improve code clarity without changing behavior. The validation state keying change is the core fix that enables peer-independent duplicate detection.
  • No files require special attention

Important Files Changed

Filename Overview
message/validation/signer_state.go Added SeenViolations tracking and IgnoreOrReject method to distinguish first-time violations (ignored) from repeated violations by same peer (rejected)
message/validation/errors.go Moved duplicate message errors from reject to ignore category, renamed error constants for clarity
message/validation/validation.go Changed validation lock key from peerID+messageID to just messageID, ensuring duplicate detection is peer-independent
message/validation/consensus_validation.go Updated duplicate message validation to use IgnoreOrReject for different proposal data, decided messages with same signers, and consensus message limits
message/validation/partial_validation.go Applied IgnoreOrReject pattern to partial signature message validation, renamed error constants for consistency

Sequence Diagram

sequenceDiagram
    participant Peer1 as Peer 1 (Node1A)
    participant Node as SSV Node
    participant Peer2 as Peer 2 (Node1B)
    participant State as SignerState
    
    Note over Node,State: First message arrives
    Peer1->>Node: Send message (msgID=Op+Duty+Slot+Round)
    Node->>State: Check if duplicate via SeenMsgTypes
    State-->>Node: Not seen yet
    Node->>State: Record message type
    Node->>State: Track violation: none
    Node->>Peer1: ACCEPT & broadcast
    
    Note over Node,State: Duplicate from SAME peer
    Peer1->>Node: Send same message again
    Node->>State: Check if duplicate via SeenMsgTypes
    State-->>Node: Already seen (limit reached)
    Node->>State: Check SeenViolations[Peer1][ErrDuplicatedMessage]
    State-->>Node: Not seen from this peer before
    Node->>State: Record SeenViolations[Peer1][ErrDuplicatedMessage]
    Node->>Peer1: IGNORE (first violation)
    
    Note over Node,State: Second duplicate from SAME peer
    Peer1->>Node: Send same message third time
    Node->>State: Check if duplicate via SeenMsgTypes
    State-->>Node: Already seen (limit reached)
    Node->>State: Check SeenViolations[Peer1][ErrDuplicatedMessage]
    State-->>Node: Already seen from this peer!
    Node->>Peer1: REJECT (repeated violation)
    
    Note over Node,State: Duplicate from DIFFERENT peer
    Peer2->>Node: Send same message (different peerID)
    Node->>State: Check if duplicate via SeenMsgTypes
    State-->>Node: Already seen (limit reached)
    Node->>State: Check SeenViolations[Peer2][ErrDuplicatedMessage]
    State-->>Node: Not seen from this peer before
    Node->>State: Record SeenViolations[Peer2][ErrDuplicatedMessage]
    Node->>Peer2: IGNORE (first violation for this peer)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Dec 27, 2025

Codecov Report

❌ Patch coverage is 82.75862% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.1%. Comparing base (b6a5fe0) to head (e3f2cfd).

Files with missing lines Patch % Lines
message/validation/partial_validation.go 41.1% 18 Missing and 2 partials ⚠️
network/topics/msg_id.go 54.5% 3 Missing and 2 partials ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@iurii-ssv iurii-ssv marked this pull request as ready for review December 29, 2025 14:44
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 29, 2025

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Copy link

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass, approach is pretty reasonable, i will do another pass.

Left a comment.


"github.com/attestantio/go-eth2-client/spec/phase0"
"github.com/libp2p/go-libp2p/core/peer"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this line please, it is good practice to separate our own libraries from external deps. makes audit a tad easier.

Copy link
Contributor Author

@iurii-ssv iurii-ssv Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually still consistent with the current version of our make format command:

  • if you run it after removing all the newlines,
  • but once those newlines are already there it doesn't remove those

so we can end up with 2 valid but different-looking imports sections (both of which are valid from make format & make lint perspective)

Note also that currently we classify ssv-spec as any other 3rd-party repo (we don't bundle ssv-spec and ssv packages together in under imports section).

I've pushed a commit to revert these imports-affecting changes for now, but in general I think we shouldn't worry much about it (or maybe use something like gci to enforce certain rules) - e3f2cfd

signerCount := len(signedSSVMessage.OperatorIDs)
if signerCount > 1 {
if signerState.SeenSigners == nil {
signerState.SeenSigners = make(map[SignersBitMask]struct{}) // lazy init on demand to reduce mem consumption
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to garbage collect cold entries? Maybe also limit size?

I think it might be a good idea, i am concerned it might open a dos attack vector by making us use potentially unlimited memory in this map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SeenSigners and SeenViolations we basically drop the reference(s) to those in SignerState.Reset letting the Golang GC to collect those maps we used in the past.

So it is something to consider, but from what I see in the code it looks like it's working correctly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this limits the attack vector considerably. I think it should be ok too, i just get uneasy when i see maps reachable from the network.

@y0sher y0sher requested a review from MatheusFranco99 January 5, 2026 12:30
@MatheusFranco99
Copy link
Contributor

1 violation is always allowed (if some peer has already sent some message, and then he sends the same/similar message 1 more time - he will not be punished by message rejection that 1 time, the receiving SSV node will simply ignore such a message for the 1st time)

Hmm, I don't see a clear reason for allowing this. I think that's already what we want to avoid: the same peer sending the same logical message twice ("duplicated"). I think it could be rejected already on its first duplicated.


  • previously messageValidator.state and messageValidator.validationLockCache maps were keyed by spectypes.MessageID+peerID ... that doesn't look correct to me because the "state" doesn't/shouldn't know/depend on which peer a message is received from
  • in this PR messageValidator.state and messageValidator.validationLockCache maps are keyed by spectypes.MessageID (which is basically = Operator+DutyType) only, meaning messages with the same spectypes.MessageID targeting different slots will acquire the same validation-lock and will be validated sequentially ... I think we could potentially optimize that further making validation for messages targeting different slots run concurrently, but it would be hard to implement this correctly on top of existing validation code (and this is out of scope of this PR anyway)

Btw, this is a critical change. Per-peer state (btw maybe it's easy to reason about it as "network view") was introduced so that we could penalize peers more accurately, as it's only responsible by its own state.
Still, at least, I think having a common state could only hurt the logical rules (as semantic rules don't matter here). For example, these come to mind:

(from knowledge-base)

QBFT Logic

Verification Error Classification Description
Double Proposal ErrDuplicatedMessage Reject Signer already sent a proposal for round.
= for Prepare, Commit, and RC - - -
Already advanced round ErrRoundAlreadyAdvanced Ignore Signer is already in a future round.

PSig Logic

Invalid signature type count ErrInvalidPartialSignatureTypeCount Reject It's allow only:
1 PostConsensusPartialSig, for Committee duty,
1 RandaoPartialSig and 1 PostConsensusPartialSig for Proposer,
1 SelectionProofPartialSig and 1 PostConsensusPartialSig for Aggregator,
1 SelectionProofPartialSig and 1 PostConsensusPartialSig for Sync committee contribution,
1 ValidatorRegistrationPartialSig for Validator Registration,
1 VoluntaryExitPartialSig for Voluntary Exit.

Duty Logic

Already advanced slot ErrSlotAlreadyAdvanced Ignore (Non-committee roles) Signer already advanced to later slot.
Too many duties per epoch ErrTooManyDutiesPerEpoch Ignore If role is either aggregator, voluntary exit and validator registration,
it's allowed 2 duties per epoch. Else if committee,
2*V (if no validator is doing sync committee).
Else accept.

For the duplicated cases, we already already solving it with this change. For the *AlreadyAdvanced and ErrTooManyDutiesPerEpoch, I don't think an attacker can do much (e.g. trying to populate slots for a small committee so that it wouldn't be able to send more msgs) as the msgs also need to be correctly signed, so the attacker would need to be in the committee itself.

I think it may be good for @GalRogozinski to take a look here as well

@iurii-ssv
Copy link
Contributor Author

iurii-ssv commented Jan 6, 2026

@MatheusFranco99 thanks for the review,

Hmm, I don't see a clear reason for allowing this. I think that's already what we want to avoid: the same peer sending the same logical message twice ("duplicated"). I think it could be rejected already on its first duplicated.

As I mentioned in PR description, it would be hard/expensive to implement the necessary logic to track & correctly punish the very 1st violation. So instead, we just "record" the violation 1st time it happens and only "punish" (by message reject) if it happens again.

Does it make sense ? I agree that it would be better to not go for this "shortcut", but it's a good practical trade-off to take, I think (it's not like it can be abused by the attacker).

Still, at least, I think having a common state could only hurt the logical rules (as semantic rules don't matter here)

I'm not sure I understand what this ^ part refers to, could you expand ? Are the changes in this PR fine to do or not ?

@MatheusFranco99
Copy link
Contributor

it would be hard/expensive to implement the necessary logic to track & correctly punish the very 1st violation. So instead, we just "record" the violation 1st time it happens and only "punish" (by message reject) if it happens again.

Hmm, sorry I'm a bit lost here. When you say we just "record" the violation 1st time it happens, isn't it already the necessary logic to track it being used? If not, how do you detect it? Or is this detection by some other "approximation" mechanism?

I think I didn't completely understood it by looking at the code, but is the this duplication counter (for a certain peer) per logical message step (like committee:X, duty type: D, slot: S, round: R, prepare), or is it a counter for all messages?

I'm not sure I understand what this ^ part refers to, could you expand ? Are the changes in this PR fine to do or not ?

This is just a side-thought/concern, just adding it here for others to also think about it and confirm I didn't miss something

@iurii-ssv
Copy link
Contributor Author

iurii-ssv commented Jan 6, 2026

Hmm, sorry I'm a bit lost here. When you say we just "record" the violation 1st time it happens, isn't it already the necessary logic to track it being used? If not, how do you detect it? Or is this detection by some other "approximation" mechanism?

Yeah, sorry, it's a bit confusing, basically

  • in my terminology, violation is a validation-logic error before we classify that error as ignore or reject
  • so we need to figure out whether the violation is intentional (same peer knowingly sending us the bad message 2+ times, in which case we want to reject the 2nd, 3rd, etc. messages) or unintentional (due to the way libp2p works, any peer can unknowingly send you a violating message 1 time, so we need to ignore it as long as it is their 1st time)
  • prior violations are tracked via SeenViolations structure in this PR
  • also note, this PR targets (properly classifies between ignore and reject, instead of just always doing ignore) only subset of possible duplicate messages SSV node might receive (specifically those observed to be sent when running a duplicate SSV node during my testing) ... ideally we'd want to treat all errors with the same approach, but maybe it's just not necessary in practice

Copy link
Contributor

@GalRogozinski GalRogozinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am rejecting not because the change isn't good, I actually don't know if it is good.
It is simply something the spec team need to schedule time for correct analysis.

The rejection is just to hold off the merge for now

cc @Tom-ssvlabs

Copy link
Contributor

@oleg-ssvlabs oleg-ssvlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gj!
Left couple of comments

if len(signedSSVMessage.FullData) != 0 && signerState.HashedProposalData != nil {
if *signerState.HashedProposalData != sha256.Sum256(signedSSVMessage.FullData) {
return ErrDifferentProposalData
msgHashedProposalData := sha256.Sum256(signedSSVMessage.FullData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here there is a validation which ensures consensusMessage.Root equals to hash FullData (same SHA256 hashing under the specabft.HashDataRoot() method). This validation runs before this code.

This basically means you don't need to hash FullData here for comparison, you can do this instead:

*signerState.HashedProposalData != consensusMessage.Root

Hashing is not very expensive, but FullData max size is 8Mb (considering ssz-max tag), so this minor optimization might introduce some performance improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also change here from:

fullDataHash := sha256.Sum256(signedSSVMessage.FullData)
signerState.HashedProposalData = &fullDataHash

to:

 signerState.HashedProposalData = &consensusMessage.Root 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants