Skip to content

Conversation

yacovm
Copy link
Collaborator

@yacovm yacovm commented Oct 5, 2025

This commit prevents to redeem a node in a given orbit if the node has been lastly suspected in the same orbit.

The intent is to prevent a situation where a node momentarily loses connection in the round it is supposed to propose a block, it is then blacklisted after a few rounds, and the nodes then redeem it when they build blocks notarized on top of the blocks where it is redeemed in. However, since the nodes use the finalized blocks and not the notarized blocks to know whether to time out immediately on a blacklisted node or not, they might have not finalized the block where it has been redeemed in by time, and therefore they immediately time out. Since they immediately time out, the node is then blacklisted again, as it has been redeemed right before it had a chance to propose a block.

This commit prevents this problem, by preventing a node to be redeemed in the same round it was blacklisted in.

This prevents the node from being blacklisted in the orbit successive to the orbit it has been blacklisted in, because a blacklisted node cannot be blacklisted again, and once a node is redeemed in an orbit, it will not be blacklisted in that orbit again:

For example, node 2 out of 0,1,2,3 is supposed to propose a block in round 6, and then in round 10. If round 6 times out, it will not be redeemed in rounds 7 to 9, and it can start be redeemed in round 11. It won't be blacklisted in the orbit corresponding to round 10, because it is already blacklisted in that orbit, and we only vote to blacklist a node if it was not blacklisted in the round it was supposed to propose a block in.

This commit prevents to redeem a node in a given orbit if the node has been lastly suspected in the same orbit.

The intent is to prevent a situation where a node momentarily loses connection in the round it is supposed to propose a block,
it is then blacklisted after a few rounds, and the nodes then redeem it when they build blocks notarized on top of the blocks where it is redeemed in.
However, since the nodes use the finalized blocks and not the notarized blocks to know whether to time out immediately on a blacklisted node or not,
they might have not finalized the block where it has been redeemed in by time, and therefore they immediately time out.
Since they immediately time out, the node is then blacklisted again, as it has been redeemed right before it had a chance to propose a block.

This commit prevents this problem, by preventing a node to be redeemed in the same round it was blacklisted in.

This prevents the node from being blacklisted in the orbit successive to the orbit it has been blacklisted in,
because a blacklisted node cannot be blacklisted again, and once a node is redeemed in an orbit, it will not be
blacklisted in that orbit again:

For example, node 2 out of 0,1,2,3 is supposed to propose a block in round 6, and then in round 10.
If round 6 times out, it will not be redeemed in rounds 7 to 9, and it can start be redeemed in round 11.
It won't be blacklisted in the orbit corresponding to round 10, because it is already blacklisted in that orbit,
and we only vote to blacklist a node if it was not blacklisted in the round it was supposed to propose a block in.

Signed-off-by: Yacov Manevich <[email protected]>
@samliok
Copy link
Collaborator

samliok commented Oct 7, 2025

can we update TestReplicationNodeDiverges and remove this line

t.Skip("flaky test, should re-enable with the fix for https://github.com/ava-labs/Simplex/issues/262")

The fix should make the test not flakey, although we'll need to slightly update the assertions of the test. Maybe for round numBlocks + 1 (round 6) we should assert an empty notarization. then assert node 1 can build a block for round 12.

Copy link
Collaborator

@samliok samliok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but i'd like to fix TestReplicationNodeDiverges either in this pr or the next. Let me know if you'd like me to do it in a follow up, since I've debugged the test for some time(although it should be fairly straightforward)

@yacovm
Copy link
Collaborator Author

yacovm commented Oct 7, 2025

Approved, but i'd like to fix TestReplicationNodeDiverges either in this pr or the next. Let me know if you'd like me to do it in a follow up, since I've debugged the test for some time(although it should be fairly straightforward)

Yeah I totally forgot about the test, good call.

@yacovm yacovm merged commit 4926c5f into main Oct 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants