Skip to content

Conversation

jordanschalm
Copy link
Member

@jordanschalm jordanschalm commented Oct 2, 2025

This PR addresses #7758, which describes a potential race condition (if concurrent cluster block finalization and extension are allowed in the future). Currently concurrent finalization and extension are not allowed, so this race condition cannot happen.

it is difficult to replicate the race condition we are testing because:
- it is a race condition
- the race condition is currently disallowed by our locking granularity

The test case triggers the condition by manually modifying the database
to only contain the reference block height index.
@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
state/cluster/badger/mutator.go 83.33% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@jordanschalm jordanschalm changed the title Jord/7758 tx dedup forking Address transaction de-duplication race condition Oct 2, 2025
@jordanschalm jordanschalm marked this pull request as ready for review October 2, 2025 16:30
@jordanschalm jordanschalm requested a review from a team as a code owner October 2, 2025 16:30
Copy link
Member

@durkmurder durkmurder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused about this logic. It's hard to grasp it without prior experience with this. I think the difficulty is in LookupClusterBlocksByReferenceHeightRange. It returns a list of blocks for height range, and then we try to find out on which fork it is. What if we use fork traversal logic from fork. package and traverse the fork directly. Do you think it can simplify the logic in this function?

@jordanschalm
Copy link
Member Author

jordanschalm commented Oct 6, 2025

What if we use fork traversal logic from fork. package and traverse the fork directly

@durkmurder Do you mean replace the reference_block_height -> cluster_block_id index traversal with a traversal of the fork on the cluster chain? That doesn't work well, because you don't know where in the fork you should terminate:

  • The fork we are traversing is in the cluster state
  • But the depth of our traversal is determined by the ReferenceBlockID field (the height of the reference block on the main chain), not the height on the cluster chain
  • There isn't a strict relationship between the cluster height and the reference block height.
  • In addition, if we traverse the cluster chain directly we may traverse some extra blocks (with non-overlapping reference heights) that we don't need to check

For example, suppose I need to traverse the reference height range $H_{min}-H_{max}$. If I traverse the cluster chain ancestry until I find a reference block height $\lt H_{min}$, I still can't stop because there can be older cluster blocks with new reference blocks. Because of this, we store an index mapping the reference block height to the cluster ID. Then I can find all the cluster blocks corresponding to a reference block height range by traversing the index.

That index is stored for finalized blocks, then we use the standard fork traversal for the un-finalized fork. So we check the fork directly for candidate $C$ down to finalized block $F$, then we use the index for any finalized blocks $F$ and older with overlapping reference heights.

That is how the implementation currently works. Then, this PR is addressing the specific race condition described in #7758. Essentially, if finalization is occurring concurrently with extension, then a child of $F$ $F_{child}$ could be finalized between when we check the un-finalized and finalized fork (this is particularly a problem with Pebble, because of the read-committed isolation). If that happens, and if $F_{child}$ is not on the same fork as $C$, then we might incorrectly flag a duplicate transaction. This PR adds a check to reject any such $F_{child}$ s.

Copy link
Member

@durkmurder durkmurder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants