-
Notifications
You must be signed in to change notification settings - Fork 11
Add function to fsync all segments until the current one #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e0126a0
to
4715bad
Compare
@bboreham replying here to comments from grafana/mimir#12816 I have updated the logic a bit from the version you reviewed - the current segment fsync is now queued through the actor channel. The mutex is held while queuing the - the actual fsync operations happen asynchronously after the mutex is released.
I think one reason is that this allows the fsync to be executed without holding the mutex but still avoids race conditions where the segment could be closed before it's fsynced (the write log Close() function waits for the actor channel to drain before closing the current segment)
I'm not sure what this refers too - w.segement being nil? w.segment changing or being closed? |
4715bad
to
4fad846
Compare
} | ||
} | ||
|
||
w.mtx.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure about where we acquire and release the lock.
The potential problems i can think of right now are
- there's a race between taking a copy of w.segment, unlocking the mutex and running the fsync. it's possible another goroutine replaced the w.segment; so at that point we no longer have the current segment fsync'd. it may not be a huge problem since, there's still the same race right before returning from
FsyncSegmentsUntilCurrent
- we can get a new segment. - We end up with a deadlock where pushing to the channel blocks because the currently executed function from actorc is trying to acquire the lock. This currently doesn't happen, but i don't see
i don't think it's possible to solve both. From reading the only other place which uses actorc and the places which do operations on the WL while holding the lock, I think that 2. is assumed to never happen. So I think we should release the lock only after getting done
closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i forgot - if you end up taking a copy of w.segment, then you should do that while holding a lock inside the actor func instead of before creating the actor closure - this will help avoid races
<!-- Thanks for sending a pull request! Before submitting: 1. Read our CONTRIBUTING.md guide 2. Rebase your PR if it gets out of sync with main --> #### What this PR does This PR introduces the `-ingest-storage.write-logs-fsync-before-kafka-commit-enabled` flag, enabled by default, to make sure all WAL/WBL segment files are fsynced before the corresponding offset is committed in Kafka. Currently unclean ingester shutdowns can cause data loss, as write log files aren't guaranteed to be fsynced before the corresponding offsets are committed. This means that when an ingester restarts, it could see corrupted WAL data (and therefore discard it), but not replay all the required data from Kafka as it only resumes from the committed offset. An additional `-ingest-storage.write-logs-fsync-before-kafka-commit-concurrency` flag has been added to enable us to reduce the amount of time to do the fsyncs (and therefore the amount of time to commit an offset). An optimisation is to only fsync tsdbs which have had updates since the last offset was committed; we'll see how the current version runs in production and decide whether that's worth adding it. Depends on grafana/mimir-prometheus#998 #### Checklist - [x] Tests updated. - [x] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. (didn't update as `-ingest-storage.*` is already mentioned as a whole) --------- Co-authored-by: Dimitar Dimitrov <[email protected]>
fsyncs for the WAL/WBL do not happen every time a record is logged, see: prometheus/prometheus#5869
This means that if Mimir is shut down uncleanly, it's possible for there to be WAL corruption errors due to not all data being written to disk before shutdown.
Adding a
FsyncSegmentsUntilCurrent()
function which will ensure all segments up until the current one have been fsycned before returning. Also added a corresponding function for the tsdb head, so it can be called from Mimir code.Corresponding Mimir PR: grafana/mimir#12816