Skip to content

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Aug 12, 2025

Adapt Checkpoint to copy a consistent snapshot of the database. Previously Checkpoint acquired a Version, ensuring a consistent snapshot of the sstables within the version, but copied the WALs arbitrarily. This allowed Checkpoint to race with inflight WAL writes and allowed for a few sources of inconsistency in checkpoints:

  • The checkpointing goroutine racing with the LogWriter could copy an old block from a WAL file's recycled history, followed by a new block containing recent writes. If a chunk in the new block indicated that the old block should've been durably synced, the WAL was considered corrupt on replay when the checkpoint was opened.
  • The checkpointing goroutine could copy WAL records containing batches that were committed after version edits (eg, ingests, excises) that were committed after the checkpointed Version. This would violate the checkpoint's consistency.

Now Checkpoint reads the current visible sequence number and only copies the prefix of the WAL that is visible at the sequence number. This ensures that a Checkpoint's snapshot isolation semantics mirror that of an Iterator.

Informs cockroachdb/cockroach#150660.

@jbowens jbowens requested a review from a team as a code owner August 12, 2025 17:29
@jbowens jbowens requested a review from annrpom August 12, 2025 17:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @annrpom)


wal/reader.go line 392 at r1 (raw file):

// Copy does NOT sync the destination directory, and the caller must explicitly
// sync it if they require the new WAL file to be durably linked.
func Copy(

Should we mention that we allow the logical log to be updated synchronously?


wal/reader.go line 415 at r1 (raw file):

		// A batch assigns sequence numbers beginning at the header's SeqNum,
		// increasing for successive internal keys. We only copy the batch if
		// the entirety of the batch's keys are visible.

Is it possible to have a batch that has some visible and some non-visible keys? There would be no way to get an "atomic" WAL for the given sequence key.. Should we error out or assert in this case?


wal/reader.go line 428 at r1 (raw file):

		}
	}
	if err := w.Close(); err != nil {

[nit] Add a comment mentioning that Close() syncs the file

Adapt Checkpoint to copy a consistent snapshot of the database. Previously
Checkpoint acquired a Version, ensuring a consistent snapshot of the sstables
within the version, but copied the WALs arbitrarily. This allowed Checkpoint to
race with inflight WAL writes and allowed for a few sources of inconsistency
in checkpoints:

- The checkpointing goroutine racing with the LogWriter could copy an old block
  from a WAL file's recycled history, followed by a new block containing recent
  writes. If a chunk in the new block indicated that the old block should've
  been durably synced, the WAL was considered corrupt on replay when the
  checkpoint was opened.
- The checkpointing goroutine could copy WAL records containing batches that
  were committed after version edits (eg, ingests, excises) that were committed
  after the checkpointed Version. This would violate the checkpoint's
  consistency.

Now Checkpoint reads the current visible sequence number and only copies the
prefix of the WAL that is visible at the sequence number. This ensures that a
Checkpoint's snapshot isolation semantics mirror that of an Iterator.
@jbowens jbowens force-pushed the checkpoint-isolation branch from 084b89b to cf4cdc1 Compare August 14, 2025 15:06
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @annrpom and @RaduBerinde)


wal/reader.go line 392 at r1 (raw file):

Previously, RaduBerinde wrote…

Should we mention that we allow the logical log to be updated synchronously?

Done


wal/reader.go line 415 at r1 (raw file):

Previously, RaduBerinde wrote…

Is it possible to have a batch that has some visible and some non-visible keys? There would be no way to get an "atomic" WAL for the given sequence key.. Should we error out or assert in this case?

Good call—added an errors.AssertionFailedf error case.


wal/reader.go line 428 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] Add a comment mentioning that Close() syncs the file

Done

@jbowens jbowens merged commit e8604b9 into cockroachdb:master Aug 18, 2025
8 checks passed
@jbowens jbowens deleted the checkpoint-isolation branch August 18, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants