Add function to fsync all segments until the current one #998

fionaliao · 2025-10-03T18:05:19Z

fsyncs for the WAL/WBL do not happen every time a record is logged, see: prometheus/prometheus#5869

This means that if Mimir is shut down uncleanly, it's possible for there to be WAL corruption errors due to not all data being written to disk before shutdown.

Adding a FsyncSegmentsUntilCurrent() function which will ensure all segments up until the current one have been fsycned before returning. Also added a corresponding function for the tsdb head, so it can be called from Mimir code.

Corresponding Mimir PR: grafana/mimir#12816

fionaliao · 2025-10-08T17:47:20Z

@bboreham replying here to comments from grafana/mimir#12816

I have updated the logic a bit from the version you reviewed - the current segment fsync is now queued through the actor channel. The mutex is held while queuing the - the actual fsync operations happen asynchronously after the mutex is released.

For consistency, you could do the fsync inside the actor function.
I can't come up with any concrete reason why this would be better; it just seems nicer that all fsyncs would happen in order.

I think one reason is that this allows the fsync to be executed without holding the mutex but still avoids race conditions where the segment could be closed before it's fsynced (the write log Close() function waits for the actor channel to drain before closing the current segment)

grafana/mimir#12816 (comment)

How can this happen?

I'm not sure what this refers too - w.segement being nil? w.segment changing or being closed?

dimitarvdimitrov · 2025-10-10T13:38:30Z

tsdb/wlog/wlog.go

+		}
+	}
+
+	w.mtx.Unlock()


i'm not sure about where we acquire and release the lock.

The potential problems i can think of right now are

there's a race between taking a copy of w.segment, unlocking the mutex and running the fsync. it's possible another goroutine replaced the w.segment; so at that point we no longer have the current segment fsync'd. it may not be a huge problem since, there's still the same race right before returning from FsyncSegmentsUntilCurrent - we can get a new segment.

We end up with a deadlock where pushing to the channel blocks because the currently executed function from actorc is trying to acquire the lock. This currently doesn't happen, but i don't see

i don't think it's possible to solve both. From reading the only other place which uses actorc and the places which do operations on the WL while holding the lock, I think that 2. is assumed to never happen. So I think we should release the lock only after getting done closed

something i forgot - if you end up taking a copy of w.segment, then you should do that while holding a lock inside the actor func instead of before creating the actor closure - this will help avoid races

tsdb/wlog/wlog.go

#### What this PR does This PR introduces the `-ingest-storage.write-logs-fsync-before-kafka-commit-enabled` flag, enabled by default, to make sure all WAL/WBL segment files are fsynced before the corresponding offset is committed in Kafka. Currently unclean ingester shutdowns can cause data loss, as write log files aren't guaranteed to be fsynced before the corresponding offsets are committed. This means that when an ingester restarts, it could see corrupted WAL data (and therefore discard it), but not replay all the required data from Kafka as it only resumes from the committed offset. An additional `-ingest-storage.write-logs-fsync-before-kafka-commit-concurrency` flag has been added to enable us to reduce the amount of time to do the fsyncs (and therefore the amount of time to commit an offset). An optimisation is to only fsync tsdbs which have had updates since the last offset was committed; we'll see how the current version runs in production and decide whether that's worth adding it. Depends on grafana/mimir-prometheus#998 #### Checklist - [x] Tests updated. - [x] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. (didn't update as `-ingest-storage.*` is already mentioned as a whole) --------- Co-authored-by: Dimitar Dimitrov <[email protected]>

fionaliao force-pushed the fl/sync-segments-until-current branch 3 times, most recently from e0126a0 to 4715bad Compare October 8, 2025 17:24

fionaliao added 3 commits October 8, 2025 18:56

Sync wl segments

8002d80

Ensure previous segments are fsynced

1c3e746

Reduce time in mutex

4fad846

fionaliao force-pushed the fl/sync-segments-until-current branch from 4715bad to 4fad846 Compare October 8, 2025 17:56

fionaliao mentioned this pull request Oct 8, 2025

fsync write logs before offset is committed grafana/mimir#12816

Merged

4 tasks

dimitarvdimitrov reviewed Oct 10, 2025

View reviewed changes

tsdb/wlog/wlog.go Show resolved Hide resolved

Apply suggestion from @dimitarvdimitrov

058ad9b

dimitarvdimitrov approved these changes Oct 10, 2025

View reviewed changes

fionaliao enabled auto-merge (squash) October 10, 2025 15:19

fionaliao merged commit 34fc47d into main Oct 10, 2025
28 checks passed

fionaliao deleted the fl/sync-segments-until-current branch October 10, 2025 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add function to fsync all segments until the current one #998

Add function to fsync all segments until the current one #998

Uh oh!

fionaliao commented Oct 3, 2025 •

edited

Loading

Uh oh!

fionaliao commented Oct 8, 2025

Uh oh!

dimitarvdimitrov Oct 10, 2025

Uh oh!

dimitarvdimitrov Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add function to fsync all segments until the current one #998

Add function to fsync all segments until the current one #998

Uh oh!

Conversation

fionaliao commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fionaliao commented Oct 8, 2025

Uh oh!

dimitarvdimitrov Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

dimitarvdimitrov Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fionaliao commented Oct 3, 2025 •

edited

Loading