-
Notifications
You must be signed in to change notification settings - Fork 653
fsync write logs before offset is committed #12816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4a5bbb8
to
6832566
Compare
// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed | ||
if w.segment != nil { | ||
err = w.fsync(w.segment) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsync can take a long time - can you take a copy of w.segment
and do it outside?
} | ||
|
||
done := make(chan struct{}) | ||
// all previous segments before w.segment should either have been fsynced and closed or still in the actorc queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, you could do the fsync
inside the actor function.
I can't come up with any concrete reason why this would be better; it just seems nicer that all fsyncs would happen in order.
return errors.New("unable to fsync segments: write log is closed") | ||
} | ||
// fsync current segment within mutex to avoid race conditions where the segment could be updated or closed | ||
if w.segment != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this happen?
c6a5c7d
to
d67477d
Compare
💻 Deploy preview deleted. |
@bboreham - thanks for taking a look :) I've applied/answered your comments in the mimir-prometheus PR instead: grafana/mimir-prometheus#998 (easier to update the prometheus code there if there are additional comments) |
3f584c6
to
ddda6f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a few minor comments
} | ||
|
||
type PreCommitNotifier interface { | ||
NotifyPreCommit(ctx context.Context) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document a little bit how this interface is supposed to be implemented... or perhaps what's a good thing to put behind this interface and then how it's going to be invoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done:
mimir/pkg/storage/ingest/pusher.go
Lines 34 to 37 in c685d49
// NotifyPreCommit is called before committing a Kafka offset to allow for | |
// synchronization or cleanup operations. The offset to commit is determined before this call. | |
// The committer waits for this method to complete before proceeding with the actual | |
// commit to Kafka. |
Co-authored-by: Dimitar Dimitrov <[email protected]>
c685d49
to
7b5438f
Compare
What this PR does
This PR introduces the
-ingest-storage.write-logs-fsync-before-kafka-commit-enabled
flag, enabled by default, to make sure all WAL/WBL segment files are fsynced before the corresponding offset is committed in Kafka.Currently unclean ingester shutdowns can cause data loss, as write log files aren't guaranteed to be fsynced before the corresponding offsets are committed. This means that when an ingester restarts, it could see corrupted WAL data (and therefore discard it), but not replay all the required data from Kafka as it only resumes from the committed offset.
An additional
-ingest-storage.write-logs-fsync-before-kafka-commit-concurrency
flag has been added to enable us to reduce the amount of time to do the fsyncs (and therefore the amount of time to commit an offset).An optimisation is to only fsync tsdbs which have had updates since the last offset was committed; we'll see how the current version runs in production and decide whether that's worth adding it.
Depends on grafana/mimir-prometheus#998
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. If changelog entry is not needed, please add thechangelog-not-needed
label to the PR.about-versioning.md
updated with experimental features. (didn't update as-ingest-storage.*
is already mentioned as a whole)