fsync write logs before offset is committed #12816

fionaliao · 2025-09-25T17:45:37Z

What this PR does

This PR introduces the -ingest-storage.write-logs-fsync-before-kafka-commit-enabled flag, enabled by default, to make sure all WAL/WBL segment files are fsynced before the corresponding offset is committed in Kafka.

Currently unclean ingester shutdowns can cause data loss, as write log files aren't guaranteed to be fsynced before the corresponding offsets are committed. This means that when an ingester restarts, it could see corrupted WAL data (and therefore discard it), but not replay all the required data from Kafka as it only resumes from the committed offset.

An additional -ingest-storage.write-logs-fsync-before-kafka-commit-concurrency flag has been added to enable us to reduce the amount of time to do the fsyncs (and therefore the amount of time to commit an offset).

An optimisation is to only fsync tsdbs which have had updates since the last offset was committed; we'll see how the current version runs in production and decide whether that's worth adding it.

Depends on grafana/mimir-prometheus#998

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
about-versioning.md updated with experimental features. (didn't update as -ingest-storage.* is already mentioned as a whole)

bboreham · 2025-10-07T13:52:08Z

vendor/github.com/prometheus/prometheus/tsdb/wlog/wlog.go

+	// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed
+	if w.segment != nil {
+		err = w.fsync(w.segment)
+	}


fsync can take a long time - can you take a copy of w.segment and do it outside?

bboreham · 2025-10-07T14:06:17Z

vendor/github.com/prometheus/prometheus/tsdb/wlog/wlog.go

+	}
+
+	done := make(chan struct{})
+	// all previous segments before w.segment should either have been fsynced and closed or still in the actorc queue


For consistency, you could do the fsync inside the actor function.
I can't come up with any concrete reason why this would be better; it just seems nicer that all fsyncs would happen in order.

bboreham · 2025-10-07T14:06:29Z

vendor/github.com/prometheus/prometheus/tsdb/wlog/wlog.go

+		return errors.New("unable to fsync segments: write log is closed")
+	}
+	// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed
+	if w.segment != nil {


How can this happen?

github-actions · 2025-10-08T18:01:01Z

💻 Deploy preview deleted.

fionaliao · 2025-10-08T18:25:19Z

@bboreham - thanks for taking a look :) I've applied/answered your comments in the mimir-prometheus PR instead: grafana/mimir-prometheus#998 (easier to update the prometheus code there if there are additional comments)

dimitarvdimitrov

LGTM, left a few minor comments

pkg/storage/ingest/config.go

pkg/ingester/ingester.go

dimitarvdimitrov · 2025-10-10T13:57:21Z

pkg/storage/ingest/pusher.go

+}
+
+type PreCommitNotifier interface {
+	NotifyPreCommit(ctx context.Context) error


can you document a little bit how this interface is supposed to be implemented... or perhaps what's a good thing to put behind this interface and then how it's going to be invoked

Done:

mimir/pkg/storage/ingest/pusher.go

Lines 34 to 37 in c685d49

// NotifyPreCommit is called before committing a Kafka offset to allow for

// synchronization or cleanup operations. The offset to commit is determined before this call.

// The committer waits for this method to complete before proceeding with the actual

// commit to Kafka.

Co-authored-by: Dimitar Dimitrov <[email protected]>

fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch 7 times, most recently from 4a5bbb8 to 6832566 Compare October 3, 2025 18:11

fionaliao mentioned this pull request Oct 6, 2025

Add function to fsync all segments until the current one grafana/mimir-prometheus#998

Merged

bboreham reviewed Oct 7, 2025

View reviewed changes

fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from c6a5c7d to d67477d Compare October 8, 2025 17:59

fionaliao changed the title ~~[WIP] fsync write logs before offset is committed~~ fsync write logs before offset is committed Oct 8, 2025

fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from 3f584c6 to ddda6f0 Compare October 10, 2025 09:16

dimitarvdimitrov reviewed Oct 10, 2025

View reviewed changes

dimitarvdimitrov approved these changes Oct 10, 2025

View reviewed changes

fionaliao and others added 9 commits October 10, 2025 19:44

Fsync write logs before Kafka commit

148becc

Add concurrency

5276d57

Test with multiple users

8a5840b

Reduce concurrency to 1 by default

a32e4b1

Set fsync log as debug

eee3d0a

Update docs

c537624

Cleanup

d4ba4ef

Update pkg/storage/ingest/config.go

cef1a06

Co-authored-by: Dimitar Dimitrov <[email protected]>

Apply review comments

7b5438f

fionaliao force-pushed the fl/sync-wl-pre-kafka-commit branch from c685d49 to 7b5438f Compare October 10, 2025 18:50

fionaliao marked this pull request as ready for review October 10, 2025 18:50

fionaliao requested review from a team and tacole02 as code owners October 10, 2025 18:50

fionaliao merged commit ff99104 into main Oct 10, 2025
39 checks passed

fionaliao deleted the fl/sync-wl-pre-kafka-commit branch October 10, 2025 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fsync write logs before offset is committed #12816

fsync write logs before offset is committed #12816

Uh oh!

fionaliao commented Sep 25, 2025 •

edited

Loading

Uh oh!

bboreham Oct 7, 2025

Uh oh!

bboreham Oct 7, 2025

Uh oh!

bboreham Oct 7, 2025

Uh oh!

github-actions bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

fionaliao commented Oct 8, 2025

Uh oh!

dimitarvdimitrov left a comment

Uh oh!

Uh oh!

Uh oh!

dimitarvdimitrov Oct 10, 2025

Uh oh!

fionaliao Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// NotifyPreCommit is called before committing a Kafka offset to allow for
	// synchronization or cleanup operations. The offset to commit is determined before this call.
	// The committer waits for this method to complete before proceeding with the actual
	// commit to Kafka.

fsync write logs before offset is committed #12816

fsync write logs before offset is committed #12816

Uh oh!

Conversation

fionaliao commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Checklist

Uh oh!

bboreham Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bboreham Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bboreham Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fionaliao commented Oct 8, 2025

Uh oh!

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dimitarvdimitrov Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

fionaliao Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fionaliao commented Sep 25, 2025 •

edited

Loading

github-actions bot commented Oct 8, 2025 •

edited

Loading