add concurrency option to control max concurrent writes to db#460
Open
prydonius wants to merge 2 commits intoshuttle-hq:masterfrom
Open
add concurrency option to control max concurrent writes to db#460prydonius wants to merge 2 commits intoshuttle-hq:masterfrom
prydonius wants to merge 2 commits intoshuttle-hq:masterfrom
Conversation
prydonius
commented
Oct 15, 2024
| let params = DataSourceParams { | ||
| uri: URI::try_from(self.path.as_str())?, | ||
| schema: None, | ||
| concurrency: 1, |
Author
There was a problem hiding this comment.
not really sure what this is used for and what the default concurrency should be here
| uri: URI::try_from(cmd.from.as_str()) | ||
| .with_context(|| format!("Parsing import URI '{}'", cmd.from))?, | ||
| schema: cmd.schema, | ||
| concurrency: 1, |
Author
There was a problem hiding this comment.
I've hardcoded import jobs to concurrency 1 since I think we only use a single db connection here, but let me know if it should also be exposed in the import command.
38fd233 to
fdb6904
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When utilising Synth to export millions of rows of generated data to a database, we noticed we would consistently get
pool timed out while waiting for an open connectionerrors after some amount of time. This problem is particularly noticeable when writes of batches take longer (in our testing, batches ended up taking several seconds to write after a while especially for certain tables with triggers). The result is Synth crashes after a few minutes, and only writes a partial amount of data.We found that the issue is related to how Synth concurrently writes batches of rows to the database, it chunks by 1000 and then spins up that many tasks that wait to acquire a db connection from the pool. If writes take too long, we'll start seeing these tasks hit the acquire timeout (which by default is 30s in sqlx).
This PR introduces a new
concurrencyparameter to limit concurrency and the number of tasks we spin up at a time so that tasks are not unnecessarily waiting for a connection from the pool. This allows Synth to take as long as it needs to export a large amount of data, and allows users to configure the concurrency as needed. The pool size is also set to this parameter since there'll be at most that many connections to the database at any one time.I've manually tested this change in our environment and it no longer produces the timeouts we were seeing. We're using MySQL and so I haven't tested this with other database providers. If there's more testing I should do or add to the codebase, please let me know!
I have never written a line of Rust before, so I'd appreciate any feedback on this change and if there's ways to make this more idiomatic.