-
Notifications
You must be signed in to change notification settings - Fork 5
Finality based event syncing #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ning to finality-based event syncing for stronger cryptographic guarantees and reliability in event processing.
…y in the document by removing unnecessary details and simplifying descriptions
…activation and define specific fork mechanism and activation epochs separately feat(finality_based_event_syncing.md): update behavioral changes to include finality-based syncing and define pre-fork and post-fork behaviors feat(finality_based_event_syncing.md): simplify key implementation points for one-way transition and automatic detection of fork activation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new SIP that replaces the current follow-distance syncing approach with a finality-based mechanism post-fork, aiming to reduce reorg risks and align with Ethereum’s security model.
- Proposes switching from probabilistic follow-distance to finalized-block syncing after a configurable fork epoch
- Defines behavioral changes, transition flow, performance trade-offs, and migration steps
- Includes diagrams, performance impact analysis, and open discussion points
Comments suppressed due to low confidence (3)
sips/finality_based_event_syncing.md:1
- The Author field is currently a placeholder ('xxx'). Please replace it with the actual author name or remove the placeholder.
| xxx | Finality-Based Event Syncing | Core | open-for-discussion | 2025-05-27 |
sips/finality_based_event_syncing.md:47
- [nitpick] The code name 'Alan' isn’t defined or explained elsewhere. Consider clarifying its meaning or using a more descriptive label for the pre-fork mode.
**Pre-Fork (Alan)**:
sips/finality_based_event_syncing.md:131
- [nitpick] There’s an extra leading space before 'typically'. Removing it will keep list formatting consistent.
typically 2 epochs, so ~12.8 minutes) represents a significant change.
…ime-based and computed as `finalized_block.timestamp - processed_block.timestamp` for improved accuracy feat(finality_based_event_syncing.md): adjust event processing to occur only when latency is less than or equal to staleness threshold feat(finality_based_event_syncing.md): redefine node health as `finalized_timestamp - processed_timestamp ≤ staleness_threshold` for clarity and consistency
GalRogozinski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well written SIP :-)
I just don't think the staleness part should be here (I understand you crash the node if it is stale).
I am NOT saying you shouldn't implement this in SSV node.
The thing is that I think this is a subjective approach an implementation can take.
i.e. operators can rely on external monitoring tools to understand the health of their nodes
So unless sigma prime guys want to standardize this, it should be out.
…taleness check and health monitoring criteria for finality-based syncing
dknopik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| 1. **Pre-Fork**: Node operators must update their SSV node software to a version supporting the "FinalityConsensus" fork | ||
| well in advance of the announced activation epoch for their respective network. Monitor official announcements for | ||
| activation epoch details. | ||
| 2. **During Fork Activation**: The transition should be automatic once the node's current epoch reaches the | ||
| "FinalityConsensus" activation epoch. Operators should monitor node logs for confirmation of the new operational mode. | ||
| 3. **Post-Fork**: Verify that the node is processing events based on finalized blocks. Node operators may implement | ||
| their own monitoring solutions as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very vague. When exactly do we stop syncing with the follow method? There are multiple ways this could be interpreted:
Let X be the first slot of the activation epoch. Let B(slot) be the head block of slot (which may be a block in a previous slot if slot is empty. Let E(slot) be the epoch of slot.
a) We simply stop syncing at X: the last block processed will be B(X - 1) - 8 until E(X - 1) finalizes.
b) We stop syncing after processing B(X - 1): basically syncing until the follow distance has caught up with the epoch end.
Which one do you mean?
| - **Enhanced Security**: Transitioning to cryptographic finality for event processing inherently strengthens the node | ||
| against state corruption or inconsistencies caused by execution layer reorganizations. | ||
| - **Trust Model**: Reliance shifts more explicitly towards the finality guarantees provided by Ethereum's consensus | ||
| layer, which is a core security assumption of the PoS network. | ||
| - **Attack Surface**: This change is expected to reduce the attack surface related to manipulating node state through EL | ||
| reorgs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not mention the significant disadvantage that it becomes impossible to remove validators during long periods of non-finality. See my comment below on my ideas how to mitigate this.
| graph TB | ||
| subgraph "Proposed" | ||
| A2[New Block] --> B2[Check Finality] | ||
| B2 --> C2[Wait Finalized] | ||
| C2 --> D2[Fetch Events] | ||
| D2 --> E2[Safe Process] | ||
| end | ||
|
|
||
| subgraph "Current" | ||
| A1[New Block] --> B1[Wait 8 Blocks] | ||
| B1 --> C1[Fetch Events] | ||
| C1 --> D1[Process] | ||
| D1 --> E1[RISK: Reorg] | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flow diagram includes nodes that are not really part of the flow ("Safe Process", "RISK: Reorg").
The trigger for each diagram is "New Block". This models the new process incorrectly IMO. Check Finality sounds like there are two possible outcomes (final, not final), but there is only one arrow.
I'd just put something like
graph TB
subgraph "Proposed"
A2[New finalized checkpoint] --> B2[Fetch events of new finalized blocks]
end
to reflect that the trigger for sync progress is a new finalized checkpoint.
|
IMO, we can introduce some mechanism to mitigate the long non-finality issue a bit. Basically, we want to be able to remove validators in order to run them somewhere else, or at least exit them from the active validator set. Always perform exit eventsProcess validator exit events regardless of finality. This is safe, since it does not change validator ownership: We do not have a slashing risk as the validator is not removed from the cluster (and therefore can not be readded to another). Also, it does not matter if the event is reorged, as it does not affect the state. Pending removal flagAnother idea is immediately checking all new blocks regardless of finality. For all validator removals in this block, the validator is marked "pending removal". For all "pending removal" validators:
The only way to remove a "pending removal" flag is by actually removing the validator (i.e. finalizing a block with a remove validator event). This should be reasonably safe: The user can observe if the transaction causes the validator to stop. Even if the event reorgs, the operators will remember the flag, and not perform duties. Great care should be taken to persist this flag! Ideally, the node should remember it even if the user decides to resync the node, e.g. by storing this information in a separate file. The disadvantage of this is that the user still can not really use the validator differently, as the user does not have a slashing protection database for that validator. However, it is not impossible to e.g. get the slashing protection data from an operator and use it in an conventional validator client. Note that it is not possible to make this 100% safe, as operators may lose this information, and therefore is not my favoured solution. Disable validator configThis is similar to above solution, except it is just support for disabling a certain validator in an operator via a configuration file. The operator configure do this at the request of the validator owner. This would be very helpful for private clusters. Note that the paragraph regarding the slashing protection data from above still applies. Happy to hear comments on these ideas from you! |
Hi @dknopik , I'm not sure the |
No description provided.