Skip to content

Conversation

@kzalys
Copy link
Contributor

@kzalys kzalys commented Oct 31, 2025

For issue https://issues.apache.org/jira/browse/CASSANDRA-20995

Currently the auto-repair scheduler handles orphan node cleanup in the following flow:

  1. Check if the repair interval for the given repair type has passed.
  2. If so, attempt to cleanup any orphaned nodes.

This means that any orphaned nodes in the cluster's auto-repair history are cleaned up only once per auto-repair interval. For repair types that have a high repair interval (such as full repair) this can cause a buildup or orphaned nodes in the auto-repair history which will eventually end up blocking auto-repair on the entire cluster.

This PR re-orders the operations to make sure that orphaned node cleanup is performed during every auto-repair loop, regardless of whether or not the repair interval has passed.

@kzalys kzalys changed the title Do not wait for repair interval to pass before cleaning up orphaned auto-repair data CASSANDRA-20995 Do not wait for repair interval to pass before cleaning up orphaned auto-repair data Oct 31, 2025
@smiklosovic
Copy link
Contributor

@jaydeepkumar1984 what do you think?

orphanFinish,
"NOT_MY_TURN"
), ConsistencyLevel.QUORUM);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate the orphan node and the three live nodes entries by doing a SELECT query before starting the auto repair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants