Originally posted by acarlson1029 April 16, 2026
Currently in SourceLoader.cpp there is the following code for parallelizing the the parsing phase:
// Load all source files that were specified on the command line
// or via library maps.
pool->detach_loop(size_t(0), fileEntries.size(), [&](size_t i) {
loadResults[i] = loadAndParse(fileEntries[i], optionBag, srcOptions, i);
});
pool->wait();
From the definition of detach_loop:
Parallelize a loop by automatically splitting it into blocks and submitting each block separately to the queue, with the specified priority. The loop function takes one argument, the loop index, and it is called exactly once per index, but many times per block. Does not return a BS::multi_future, so the user must use wait() or some other method to ensure that the loop finishes executing, otherwise bad things will happen.
Current behavior
This splits the files into fixed chunks that are parsed by each thread. For large file lists with large individual files it was observed that one thread became the long pole during parsing, leaving all of the other threads idle after they've finished their blocks.
See the trace generated as the baseline with a real file list. This ran with 64 threads and took 45 seconds. The typical performance depends on the distribution of files between the blocks. If one block has many large files (e.g. parsing generated code) it may take substantially longer.
slang_trace_baseline_sized.json
Updated behavior
This can be optimized by using detach_sequence instead.
Submit a sequence of tasks enumerated by indices to the queue, with the specified priority. The sequence function takes one argument, the task index, and will be called once per index. Does not return a BS::multi_future, so the user must use wait() or some other method to ensure that the sequence finishes executing, otherwise bad things will happen.
This allows each thread to pull from the available tasks as they finish, preventing any thread from staying idle.
See the trace generated with this optimization enabled. This ran with 64 threads and took only 33s.
slang_trace_optimized_sized.json
NOTE: I haven't benchmarked this with a small list of files to see if it causes a performance regression.
Discussed in #1793
Originally posted by acarlson1029 April 16, 2026
Currently in
SourceLoader.cppthere is the following code for parallelizing the the parsing phase:From the definition of
detach_loop:Current behavior
This splits the files into fixed chunks that are parsed by each thread. For large file lists with large individual files it was observed that one thread became the long pole during parsing, leaving all of the other threads idle after they've finished their blocks.
See the trace generated as the baseline with a real file list. This ran with 64 threads and took 45 seconds. The typical performance depends on the distribution of files between the blocks. If one block has many large files (e.g. parsing generated code) it may take substantially longer.
slang_trace_baseline_sized.json
Updated behavior
This can be optimized by using
detach_sequenceinstead.This allows each thread to pull from the available tasks as they finish, preventing any thread from staying idle.
See the trace generated with this optimization enabled. This ran with 64 threads and took only 33s.
slang_trace_optimized_sized.json
NOTE: I haven't benchmarked this with a small list of files to see if it causes a performance regression.