Finality based event syncing #67

kchojn · 2025-05-28T13:42:12Z

No description provided.

…ning to finality-based event syncing for stronger cryptographic guarantees and reliability in event processing.

…y in the document by removing unnecessary details and simplifying descriptions

…activation and define specific fork mechanism and activation epochs separately feat(finality_based_event_syncing.md): update behavioral changes to include finality-based syncing and define pre-fork and post-fork behaviors feat(finality_based_event_syncing.md): simplify key implementation points for one-way transition and automatic detection of fork activation

Copilot

Pull Request Overview

This PR introduces a new SIP that replaces the current follow-distance syncing approach with a finality-based mechanism post-fork, aiming to reduce reorg risks and align with Ethereum’s security model.

Proposes switching from probabilistic follow-distance to finalized-block syncing after a configurable fork epoch
Defines behavioral changes, transition flow, performance trade-offs, and migration steps
Includes diagrams, performance impact analysis, and open discussion points

Comments suppressed due to low confidence (3)

sips/finality_based_event_syncing.md:1

The Author field is currently a placeholder ('xxx'). Please replace it with the actual author name or remove the placeholder.

| xxx    | Finality-Based Event Syncing | Core     | open-for-discussion | 2025-05-27 |

sips/finality_based_event_syncing.md:47

[nitpick] The code name 'Alan' isn’t defined or explained elsewhere. Consider clarifying its meaning or using a more descriptive label for the pre-fork mode.

**Pre-Fork (Alan)**:

sips/finality_based_event_syncing.md:131

[nitpick] There’s an extra leading space before 'typically'. Removing it will keep list formatting consistent.

  typically 2 epochs, so ~12.8 minutes) represents a significant change.

sips/finality_based_event_syncing.md

…ime-based and computed as `finalized_block.timestamp - processed_block.timestamp` for improved accuracy feat(finality_based_event_syncing.md): adjust event processing to occur only when latency is less than or equal to staleness threshold feat(finality_based_event_syncing.md): redefine node health as `finalized_timestamp - processed_timestamp ≤ staleness_threshold` for clarity and consistency

GalRogozinski

Very well written SIP :-)

I just don't think the staleness part should be here (I understand you crash the node if it is stale).
I am NOT saying you shouldn't implement this in SSV node.
The thing is that I think this is a subjective approach an implementation can take.
i.e. operators can rely on external monitoring tools to understand the health of their nodes

So unless sigma prime guys want to standardize this, it should be out.

…taleness check and health monitoring criteria for finality-based syncing

dknopik

Looks good!

dknopik · 2025-07-15T08:05:27Z

sips/finality_based_event_syncing.md

+1. **Pre-Fork**: Node operators must update their SSV node software to a version supporting the "FinalityConsensus" fork
+   well in advance of the announced activation epoch for their respective network. Monitor official announcements for
+   activation epoch details.
+2. **During Fork Activation**: The transition should be automatic once the node's current epoch reaches the 
+"FinalityConsensus" activation epoch. Operators should monitor node logs for confirmation of the new operational mode.
+3. **Post-Fork**: Verify that the node is processing events based on finalized blocks. Node operators may implement
+   their own monitoring solutions as needed.


This is very vague. When exactly do we stop syncing with the follow method? There are multiple ways this could be interpreted:

Let X be the first slot of the activation epoch. Let B(slot) be the head block of slot (which may be a block in a previous slot if slot is empty. Let E(slot) be the epoch of slot.

a) We simply stop syncing at X: the last block processed will be B(X - 1) - 8 until E(X - 1) finalizes.
b) We stop syncing after processing B(X - 1): basically syncing until the follow distance has caught up with the epoch end.

Which one do you mean?

dknopik · 2025-07-15T08:14:57Z

sips/finality_based_event_syncing.md

+- **Enhanced Security**: Transitioning to cryptographic finality for event processing inherently strengthens the node
+  against state corruption or inconsistencies caused by execution layer reorganizations.
+- **Trust Model**: Reliance shifts more explicitly towards the finality guarantees provided by Ethereum's consensus
+  layer, which is a core security assumption of the PoS network.
+- **Attack Surface**: This change is expected to reduce the attack surface related to manipulating node state through EL
+  reorgs.


This does not mention the significant disadvantage that it becomes impossible to remove validators during long periods of non-finality. See my comment below on my ideas how to mitigate this.

dknopik · 2025-07-15T09:17:00Z

sips/finality_based_event_syncing.md

+graph TB
+    subgraph "Proposed"
+        A2[New Block] --> B2[Check Finality]
+        B2 --> C2[Wait Finalized]
+        C2 --> D2[Fetch Events]
+        D2 --> E2[Safe Process]
+    end
+
+    subgraph "Current"
+        A1[New Block] --> B1[Wait 8 Blocks]
+        B1 --> C1[Fetch Events]
+        C1 --> D1[Process]
+        D1 --> E1[RISK: Reorg]
+    end


This flow diagram includes nodes that are not really part of the flow ("Safe Process", "RISK: Reorg").

The trigger for each diagram is "New Block". This models the new process incorrectly IMO. Check Finality sounds like there are two possible outcomes (final, not final), but there is only one arrow.

I'd just put something like

graph TB subgraph "Proposed" A2[New finalized checkpoint] --> B2[Fetch events of new finalized blocks] end

Loading

to reflect that the trigger for sync progress is a new finalized checkpoint.

dknopik · 2025-07-15T09:47:36Z

IMO, we can introduce some mechanism to mitigate the long non-finality issue a bit. Basically, we want to be able to remove validators in order to run them somewhere else, or at least exit them from the active validator set.

Always perform exit events

Process validator exit events regardless of finality. This is safe, since it does not change validator ownership: We do not have a slashing risk as the validator is not removed from the cluster (and therefore can not be readded to another). Also, it does not matter if the event is reorged, as it does not affect the state.

Pending removal flag

Another idea is immediately checking all new blocks regardless of finality. For all validator removals in this block, the validator is marked "pending removal". For all "pending removal" validators:

Do not perform duties
Do not forward messages where one of the following is true:
- Message ID contains the validator
- The message is a partial signature message and one of the contained partial signatures is for the validator

The only way to remove a "pending removal" flag is by actually removing the validator (i.e. finalizing a block with a remove validator event). This should be reasonably safe: The user can observe if the transaction causes the validator to stop. Even if the event reorgs, the operators will remember the flag, and not perform duties. Great care should be taken to persist this flag! Ideally, the node should remember it even if the user decides to resync the node, e.g. by storing this information in a separate file.

The disadvantage of this is that the user still can not really use the validator differently, as the user does not have a slashing protection database for that validator. However, it is not impossible to e.g. get the slashing protection data from an operator and use it in an conventional validator client.

Note that it is not possible to make this 100% safe, as operators may lose this information, and therefore is not my favoured solution.

Disable validator config

This is similar to above solution, except it is just support for disabling a certain validator in an operator via a configuration file. The operator configure do this at the request of the validator owner. This would be very helpful for private clusters.

Note that the paragraph regarding the slashing protection data from above still applies.

Happy to hear comments on these ideas from you!

y0sher · 2025-08-03T13:33:54Z

IMO, we can introduce some mechanism to mitigate the long non-finality issue a bit. Basically, we want to be able to remove validators in order to run them somewhere else, or at least exit them from the active validator set.

Always perform exit events

Process validator exit events regardless of finality. This is safe, since it does not change validator ownership: We do not have a slashing risk as the validator is not removed from the cluster (and therefore can not be readded to another). Also, it does not matter if the event is reorged, as it does not affect the state.

Pending removal flag

Another idea is immediately checking all new blocks regardless of finality. For all validator removals in this block, the validator is marked "pending removal". For all "pending removal" validators:
* Do not perform duties

* Do not forward messages where one of the following is true:
  
  * Message ID contains the validator
  * The message is a partial signature message and one of the contained partial signatures is for the validator
The only way to remove a "pending removal" flag is by actually removing the validator (i.e. finalizing a block with a remove validator event). This should be reasonably safe: The user can observe if the transaction causes the validator to stop. Even if the event reorgs, the operators will remember the flag, and not perform duties. Great care should be taken to persist this flag! Ideally, the node should remember it even if the user decides to resync the node, e.g. by storing this information in a separate file.

The disadvantage of this is that the user still can not really use the validator differently, as the user does not have a slashing protection database for that validator. However, it is not impossible to e.g. get the slashing protection data from an operator and use it in an conventional validator client.

Note that it is not possible to make this 100% safe, as operators may lose this information, and therefore is not my favoured solution.

Disable validator config

This is similar to above solution, except it is just support for disabling a certain validator in an operator via a configuration file. The operator configure do this at the request of the validator owner. This would be very helpful for private clusters.

Note that the paragraph regarding the slashing protection data from above still applies.

Happy to hear comments on these ideas from you!

Hi @dknopik ,
All of this ideas sound good to me as we were considering the exit/removal issue and it's sounds reasonable to mitigate this way, achieving the robustness against re-orgs but not hurting the UX or removing validators. Thanks for taking the time to review and add your input. We will update this PR and code soon to reflect the changes.

I'm not sure the disable validator config feature has to be part of this though, I understand why it's a feature that might be desired but doesn't have to do much with syncing.

kchojn added 3 commits May 28, 2025 16:28

feat(finality_based_event_syncing.md): add SIP proposal for transitio…

09ce872

…ning to finality-based event syncing for stronger cryptographic guarantees and reliability in event processing.

docs(finality_based_event_syncing.md): improve clarity and consistenc…

66ce888

…y in the document by removing unnecessary details and simplifying descriptions

Copilot AI review requested due to automatic review settings May 28, 2025 13:42

Copilot AI reviewed May 28, 2025

View reviewed changes

sips/finality_based_event_syncing.md Show resolved Hide resolved

GalRogozinski mentioned this pull request Jun 29, 2025

Boole Fork Wish List #39

Closed

13 tasks

GalRogozinski reviewed Jul 10, 2025

View reviewed changes

docs(finality_based_event_syncing): update documentation to clarify s…

0c23bca

…taleness check and health monitoring criteria for finality-based syncing

dknopik reviewed Jul 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finality based event syncing #67

Finality based event syncing #67

Uh oh!

kchojn commented May 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

GalRogozinski left a comment

Uh oh!

dknopik left a comment

Uh oh!

dknopik Jul 15, 2025

Uh oh!

dknopik Jul 15, 2025

Uh oh!

dknopik Jul 15, 2025

Uh oh!

dknopik commented Jul 15, 2025

Uh oh!

y0sher commented Aug 3, 2025

Always perform exit events

Pending removal flag

Disable validator config

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Finality based event syncing #67

Are you sure you want to change the base?

Finality based event syncing #67

Uh oh!

Conversation

kchojn commented May 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

GalRogozinski left a comment

Choose a reason for hiding this comment

Uh oh!

dknopik left a comment

Choose a reason for hiding this comment

Uh oh!

dknopik Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik commented Jul 15, 2025

Always perform exit events

Pending removal flag

Disable validator config

Uh oh!

y0sher commented Aug 3, 2025

Always perform exit events

Pending removal flag

Disable validator config

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants