Skip to content

Conversation

fionaliao
Copy link
Contributor

@fionaliao fionaliao commented Sep 25, 2025

What this PR does

This PR introduces the -ingest-storage.write-logs-fsync-before-kafka-commit-enabled flag, enabled by default, to make sure all WAL/WBL segment files are fsynced before the corresponding offset is committed in Kafka.

Currently unclean ingester shutdowns can cause data loss, as write log files aren't guaranteed to be fsynced before the corresponding offsets are committed. This means that when an ingester restarts, it could see corrupted WAL data (and therefore discard it), but not replay all the required data from Kafka as it only resumes from the committed offset.

An additional -ingest-storage.write-logs-fsync-before-kafka-commit-concurrency flag has been added to enable us to reduce the amount of time to do the fsyncs (and therefore the amount of time to commit an offset).

An optimisation is to only fsync tsdbs which have had updates since the last offset was committed; we'll see how the current version runs in production and decide whether that's worth adding it.

Depends on grafana/mimir-prometheus#998

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features. (didn't update as -ingest-storage.* is already mentioned as a whole)

Comment on lines 1057 to 1060
// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed
if w.segment != nil {
err = w.fsync(w.segment)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsync can take a long time - can you take a copy of w.segment and do it outside?

}

done := make(chan struct{})
// all previous segments before w.segment should either have been fsynced and closed or still in the actorc queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, you could do the fsync inside the actor function.
I can't come up with any concrete reason why this would be better; it just seems nicer that all fsyncs would happen in order.

return errors.New("unable to fsync segments: write log is closed")
}
// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed
if w.segment != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen?

@fionaliao fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from c6a5c7d to d67477d Compare October 8, 2025 17:59
Copy link
Contributor

github-actions bot commented Oct 8, 2025

💻 Deploy preview deleted.

@fionaliao fionaliao changed the title [WIP] fsync write logs before offset is committed fsync write logs before offset is committed Oct 8, 2025
@fionaliao
Copy link
Contributor Author

@bboreham - thanks for taking a look :) I've applied/answered your comments in the mimir-prometheus PR instead: grafana/mimir-prometheus#998 (easier to update the prometheus code there if there are additional comments)

@fionaliao fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from 3f584c6 to ddda6f0 Compare October 10, 2025 09:16
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few minor comments

}

type PreCommitNotifier interface {
NotifyPreCommit(ctx context.Context) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document a little bit how this interface is supposed to be implemented... or perhaps what's a good thing to put behind this interface and then how it's going to be invoked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

// NotifyPreCommit is called before committing a Kafka offset to allow for
// synchronization or cleanup operations. The offset to commit is determined before this call.
// The committer waits for this method to complete before proceeding with the actual
// commit to Kafka.

@fionaliao fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from c685d49 to 7b5438f Compare October 10, 2025 18:50
@fionaliao fionaliao marked this pull request as ready for review October 10, 2025 18:50
@fionaliao fionaliao requested review from a team and tacole02 as code owners October 10, 2025 18:50
@fionaliao fionaliao merged commit ff99104 into main Oct 10, 2025
39 checks passed
@fionaliao fionaliao deleted the fl/sync-wl-pre-kafka-commit branch October 10, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants