-
Notifications
You must be signed in to change notification settings - Fork 11
storage.BatchSeriesReferencer
to create all series in a single call
#860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
colega
wants to merge
11
commits into
main
Choose a base branch
from
batch-appender
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e0efb52
to
bf5711f
Compare
storage.BatchAppender
with BatchSeriesRef
to create all series in a single callstorage.BatchAppender
with BatchSeriesRefs
to create all series in a single call
storage.BatchAppender
with BatchSeriesRefs
to create all series in a single callstorage.BatchSeriesReferencer
to create all series in a single call
`storage.BatchAppender` adds the BatchSeriesRefs method to the Appender(). This method can be used to make sure that all the series from the incoming scrape or write request already exist (not wired here, as I'm implementing it for Mimir, but can be used for Prometheus as well). The reason to add this is to take advantage of a single operation to reduce locking time on MemPostings among others: right now during Mimir startup we have to call `Add()` millions of times within a few seconds, which causes lots of lock contention, but most of the time is actually spent locking & unlocking rather than adding series. This method does not implement the optimization yet: it only focuses on the exposed interface. I couldn't make Head.Appender() return a BatchAppender because it returns the initAppender which initializes the head on the first append call: if I wanted initAppender to implement BatchAppender we would need to add a timestamp to BatchSeriesRefs, which doesn't really belong here. So we are exposing HeadInitialized() and InitializeHead() methods and move the initialization responisibility to the caller, so it can be sure that BatchAppender() method has always an initialized appender. Signed-off-by: Oleg Zaytsev <[email protected]>
This goes one step further, and does batch series creation in Appender.BatchSeriesRef as well as creates a new implementation of getOrCreateBatch(). One of the advantages of the getOrCreateBatch() is that it's now adding new seires to MemPostings in a batch, probably in a more sorted fashion than previous implementation. MemPostings.AddBatch was added but it trivially uses Add() so far. Signed-off-by: Oleg Zaytsev <[email protected]>
This implements AddBatch() in a simple way: take the mutex, create series using the existing mechanisms, and make pauses every 512 series to allow reads to proceed: we don't want them to wait forever. This optimizes the locking/unlocking time: effectively taking it 512x times less. I think we could still do better: if batch is large enough (1000 elements?) we could send it through some workers optimizing the locked time. We can just ensure from the beginning that all labels exist, and then shard the labels across GOMAXPROCS workers, and send each label to it's specific worker, without having to coordinate them and still holding a single mutex. Signed-off-by: Oleg Zaytsev <[email protected]>
This modifies the method to copy the provided labels before creating them in the index and Head, and return the existing stored labels if possible. This is quite specific to our Mimir optimizations, but it's really needed if we want to use this method with all the current optimizations. Signed-off-by: Oleg Zaytsev <[email protected]>
Adding series to a.series will make sure they're tracked in the WAL. There's an increased risk of losing these series if gc() happens in between, that should be fixed once prometheus#16333 is merged. Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds an experimental
BatchSeriesRefs
method in the appender that should be used to create all series from a scrape/write request in a single call, reducing the mutex contention on MemPostings.Add for customers with a lot of churn and for Mimir ingesters that have recently started (or went out of read-only mode) and are receiving avalanches of new series.