-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Scalable jobserver for rustc #7731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aaaf59a to
747a7f6
Compare
|
CI here looks to be mostly failing because the current branch expects rustc to have -Zjobserver-token-requests. |
|
☔ The latest upstream changes (presumably #7737) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Can this get a doc-comment in |
|
Hm, do you mean job_queue.rs? job_server.rs doesn't appear to exist in this repository. I would be happy to provide such a doc comment -- it's worth noting that I'm guessing we'll be landing this behind some sort of -Z flag as well initially (as we test it out and such) just because it's a non-trivial overhaul of the protocol we use with rustc. |
|
Oops, yea, typo. Should be |
b6e9714 to
b180ea8
Compare
|
I've added some (somewhat) high level docs to job_queue and rebased. |
|
Ok I've read over this and it looks pretty good to me, certainly for the purposes of testing this looks like it'll cover all our bases. @Mark-Simulacrum do you want to update your PR in rust-lang/rust to have a Cargo submodule with this commit, and then make a try build so we can test that out? I think longer-term I'd want to refactor the jobserver management in Cargo here, it feels a bit spaghetti-like and I'd like to make it a bit more concise (if we can) to make it a bit more robust. Nothing concrete I can think of, just a general feeling of "this feels bolted on" which is not at all your fault, just something for me to stew on :) Also, for the vecs, I think they can be relatively trivially converted to hash maps? I'd expect something like: let rustcs_with_a_token: HashMap<u32, Vec<Token>>;
let rustcs_wanting_a_token: HashMap<u32, (usize, Client)>; // `usize` == how many tokens requested |
|
Yeah, I think the HashMaps would work. I moved between a few different schemes during development and had started out with Vecs. I'll go ahead and get the rustc branch to point at this branch (and probably rebase both, while I'm at it) and kickoff a try build. |
b180ea8 to
e2d8d98
Compare
|
Okay I've force pushed this and the rust-lang/rust branch and kicked off a try build over there, hopefully that all works out. I didn't look at implementing the HashMap stuff as I'm also a bit strapped for time :) |
|
☔ The latest upstream changes (presumably #7776) made this pull request unmergeable. Please resolve the merge conflicts. |
e2d8d98 to
c38d324
Compare
|
CI failure is.. scary "E: Failed to fetch https://packages.microsoft.com/ubuntu/16.04/prod/dists/xenial/main/binary-amd64/Packages Writing more data than expected (622228 > 606754)" But seems unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks very good to me now, thanks!
On closer reading I think the only "major" thing I'd like to see is a refactoring of the job queue code to have all of its assorted local variables wrapped up in a struct with a name, methods, etc. I think that'll make edits a bit easier to track and also easier to document things.
src/cargo/core/compiler/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may perhaps be a bit cleaner to instead of passing rustc_client around and sending it through messages to instead store it elsewhere. The processes spawned by the job queue I think should all have an optional Client associated with them (stored internally in ProcessBuilder), and job_queue could keep track of a map from id to Client in that case.
src/cargo/core/compiler/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll probably want to bikeshed this a bit, but the json messages coming out of rustc aren't exactly the most structured things in terms of precedent anyway.
I'd ideally prefer to not ignore errors if Cargo is sure that event has to do with jobserver events, whereas right now if there's a slight mismatch Cargo will ignore everything coming from rustc. In any case this is probably a concern best left for stabilization.
babeba6 to
5cb8b08
Compare
|
Okay I've updated the PR, I think I addressed most of your comments. I opted not to implement the rustc_client out-of-band storage -- that looked pretty complicated; it seems like there's a semi-opaque wall around "Work" / "Job" units and it was unclear to me how to thread state through that... maybe I missed something obvious though. Furthermore, we eventually do need to keep track of the number of requests for a given JobId, so while we could e.g. do a usize for that, it seems simpler to just send around clients. Of course if you disagree then happy to try and dig in more :) I've also -- in the process of refactoring the main loop and extracting things out into methods -- tried to add documentation here and there but since I've been fairly embedded into this code for a few hours now I'm not sure where all confusion lies anymore :) I think someone with a semi-fresh set of eyes (you) would be more helpful in basically saying "this could use a comment" -- happy to write them up then. The latest set also shifts around the algorithm for jobserver token assignment under the per-rustc model, where before we were eagerly giving tokens to the first rustc that requested them, which plausibly lead to pretty poor "multi-rustc" parallelism in practice (since the first one would likely gobble up tokens, then the second one would do the same, and so forth, presuming highly parallel and quick to request tokens rustcs). I'm not sure we were observing that behavior but it seemed possibly suboptimal. The new strategy is to dedicate assignment to the main loop, which now proceeds in a strict "least recently requested" fashion among both types of requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first). It's not quite clear that that model is good -- at least for the latter bit, maybe randomly spreading tokens around would be better, or some other heuristic, but on the other hand this means that it is likely that long-running crates will get more tokens than short-running crates, which means that crates like As a fairly orthogonal note I'm also somewhat increasingly unhappy with the jobserver crate's API -- I keep hitting the problem that the "request a token" interface does not provide for cancellation -- and we've also wanted e.g. Acquired::forget in rustc, so I'll probably put some resources there before moving on to fixing up the rustc PR. Generally my plan is to get this landed or in a state ready for landing and then move on to fixing up the rustc PR :) |
|
Tests here are all going to fail as I missed a minor detail where previously cargo would always immediately spawn up a rustc process ("sharing" the cargo implicit jobserver token with that rustc) but the new implementation would make that pretty painful -- ideally we'd synthesize the token out of nowhere, but the jobserver crate makes that hard and in any case it's not quite what we want. I think to get the correct behavior we'd basically end up with a lot of code trying to shuffle around the implicit token -- loosely, it'd look something like what the current codegen stuff does in rustc I imagine (less complicated due to everything being threadsafe though). I'm going to try and think some more on this, maybe we can discuss synchronously tomorrow too. Don't have any great ideas here :/ |
|
Oh, also forgot to mention, I updated the PR description with a summary of the PR, though it loosely duplicates what I said already in a comment. |
src/cargo/core/compiler/job_queue.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, to clarify, can this be a separate struct which only lives for the one stack frame where we're running stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as to this field specifically, I don't think we're going to want it. There should never be any long-term storage for unused tokens.
src/cargo/core/compiler/job_queue.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I'm a little worried about since it will request a huge amount more than before. I don't think we want to do these requests on each turn of the loop, we should still strive to match up a request to an acquire 1:1
5cb8b08 to
0d41076
Compare
|
Okay I dropped the commit which refactored into the unused_tokens and so forth and then restored it a bit. I also moved the state into a new struct which only exists while we're draining the job queue. |
|
Turns out the crossbeam scope was needed, unlike what the comment indicated... |
src/cargo/core/compiler/job_queue.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the purpose of this loop is to actually not call this in the case that the events are not empty, so was this perhaps a stray refactoring error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially in the case where rustc is using this channel for the jobserver protocol, events may not be empty for a long time (i.e., we're constantly trying to wait for threads, releasing tokens, etc.). I can drop this from this PR though and file it separately since it's a drive by change if anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, mind moving this to a separate commit? I think it'd be best to handle this with a sort of heartbeat message if it becomes problematic. Did you see extraneous stuttering you weren't expecting when building though?
f247393 to
eee14c0
Compare
|
I think that this latest push resolves your comments (dropped I also realize this has a ton of commits now -- do you want me to squash them down? I'll look at the CI failures later. |
This also moves and enhances the message logging the state before blocking.
eee14c0 to
7e3e1b1
Compare
|
CI is now passing. |
|
@bors: r+ Very nice, thanks so much! Commits are fine :) |
|
📌 Commit 7e3e1b1 has been approved by |
Scalable jobserver for rustc This refactors the job queue code for support of [per-rustc process jobservers](rust-lang/rust#67398). Furthermore, it also cleans up the code and refactors the main loop to be more amenable to understanding (splitting into methods and such). Assignment of tokens to either rustc "threads" or processes is dedicated to the main loop, which proceeds in a strict "least recently requested" fashion among both thread and process token requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first). It's not quite clear that that model is good (no modeling or so has been done). On the other hand this strategy should mean that long-running crates will get more thread tokens once we bottom out in terms of rustc parallelism than short-running crates, which means that crates like syn which start early on but finish pretty late should hopefully get more parallelism nicely (without any more complex heuristics). One plausible change that may be worth exploring is making the assignment prefer earlier rustc's, globally, rather than first attempting to spawn new crates and only then increasing parallelism for old crates. syn for example frequently gets compiled in the early storm of dozens of crates so is somewhat unlikely to have parallelism, until fairly late in its compilation. We also currently conflate under this model the rayon threads and codegen threads. Eventually inside rustc those will probably(?) also be just one thing, and the rustc side of this implementation provides no information as to what the token request is for so we can't do better here yet.
|
☀️ Test successful - checks-azure |
|
This PR seems to be causing panics with debug builds of Cargo. @Mark-Simulacrum can you check? The repro I have is:
It panics at progress.rs:259 where in this case that function is called with bactrace Compiling bitflags v1.2.1
thread 'main' panicked at 'attempt to subtract with overflow', src/cargo/util/progress.rs:259:21
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:77
3: ::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1057
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1426
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:195
9: std::panicking::default_hook
at src/libstd/panicking.rs:215
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:463
11: rust_begin_unwind
at src/libstd/panicking.rs:371
12: std::panicking::begin_panic
13: std::panicking::begin_panic
14: cargo::util::progress::Format::progress
at src/cargo/util/progress.rs:259
15: cargo::util::progress::State::tick
at src/cargo/util/progress.rs:176
16: cargo::util::progress::Progress::tick_now
at src/cargo/util/progress.rs:107
17: cargo::core::compiler::job_queue::DrainState::tick_progress
at src/cargo/core/compiler/job_queue.rs:714
18: cargo::core::compiler::job_queue::DrainState::wait_for_events
at src/cargo/core/compiler/job_queue.rs:600
19: cargo::core::compiler::job_queue::DrainState::drain_the_queue
at src/cargo/core/compiler/job_queue.rs:648
20: cargo::core::compiler::job_queue::JobQueue::execute::{{closure}}
at src/cargo/core/compiler/job_queue.rs:384
21: crossbeam_utils::thread::scope::{{closure}}
at /Users/eric/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.0/src/thread.rs:161
22: as core::ops::function::FnOnce<()>>::call_once
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panic.rs:318
23: std::panicking::try::do_call
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panicking.rs:296
24: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:79
25: std::panicking::try
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panicking.rs:272
26: std::panic::catch_unwind
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panic.rs:394
27: crossbeam_utils::thread::scope
at /Users/eric/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-utils-0.7.0/src/thread.rs:161
28: cargo::core::compiler::job_queue::JobQueue::execute
at src/cargo/core/compiler/job_queue.rs:384
29: cargo::util::config::clippy_driver::{{closure}}
30: cargo::ops::cargo_compile::compile_ws
at src/cargo/ops/cargo_compile.rs:469
31: cargo::ops::cargo_compile::compile_with_exec
at src/cargo/ops/cargo_compile.rs:259
32: cargo::ops::cargo_compile::compile
at src/cargo/ops/cargo_compile.rs:248
33: cargo::ops::cargo_test::compile_tests
at src/cargo/ops/cargo_test.rs:69
34: cargo::ops::cargo_test::run_tests
at src/cargo/ops/cargo_test.rs:21
35: cargo::commands::test::exec
at src/bin/cargo/commands/test.rs:162
36: cargo::cli::execute_subcommand
at src/bin/cargo/cli.rs:197
37: cargo::cli::main
at src/bin/cargo/cli.rs:107
38: cargo::main
at src/bin/cargo/main.rs:39
39: std::rt::lang_start::{{closure}}
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/rt.rs:67
40: std::rt::lang_start_internal::{{closure}}
at src/libstd/rt.rs:52
41: std::panicking::try::do_call
at src/libstd/panicking.rs:296
42: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:79
43: std::panicking::try
at src/libstd/panicking.rs:272
44: std::panic::catch_unwind
at src/libstd/panic.rs:394
45: std::rt::lang_start_internal
at src/libstd/rt.rs:51
46: std::rt::lang_start
at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/rt.rs:67
47: cargo::init_git_transports
|
|
Hm, not impossible, will take a look. I guess we're generating too many completed jobs? It'll probably be tomorrow at the earliest though, if necessary please do revert. |
|
Filed #7829 to resolve this bug. |
Store maximum queue length Previously, the queue length was constantly decreasing as we built crates, which meant that we were incorrectly displaying the progress bar. In debug builds, this even led to panics (due to underflow on subtraction). Not sure if we can add a test case for this. I have made the panic unconditional on release/debug though by explicitly checking that current is less than the maximum for the progress bar. Fixes #7731 (comment).
| while let Err(e) = client.acquire_raw() { | ||
| anyhow::bail!( | ||
| "failed to fully drain {}/{} token from jobserver at startup: {:?}", | ||
| i, | ||
| tokens, | ||
| e, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent of this loop? Should it maybe be something like this?
client.acquire_raw().chain_err(|| {
format!(
"failed to fully drain {}/{} token from jobserver at startup",
i, tokens
)
})?;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think so -- it doesn't really make any difference (bail! returns from the method anyway), but I'll send a PR cleaning it up. I suspect I had intended this as "loop until you error" but that's clearly not what this does :)
This refactors the job queue code for support of per-rustc process jobservers. Furthermore, it also cleans up the code and refactors the main loop to be more amenable to understanding (splitting into methods and such).
Assignment of tokens to either rustc "threads" or processes is dedicated to the main loop, which proceeds in a strict "least recently requested" fashion among both thread and process token requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first).
It's not quite clear that that model is good (no modeling or so has been done). On the other hand this strategy should mean that long-running crates will get more thread tokens once we bottom out in terms of rustc parallelism than short-running crates, which means that crates like syn which start early on but finish pretty late should hopefully get more parallelism nicely (without any more complex heuristics).
One plausible change that may be worth exploring is making the assignment prefer earlier rustc's, globally, rather than first attempting to spawn new crates and only then increasing parallelism for old crates. syn for example frequently gets compiled in the early storm of dozens of crates so is somewhat unlikely to have parallelism, until fairly late in its compilation.
We also currently conflate under this model the rayon threads and codegen threads. Eventually inside rustc those will probably(?) also be just one thing, and the rustc side of this implementation provides no information as to what the token request is for so we can't do better here yet.