Parallel parsing doesn't balance workloads well when some files are much larger than others


### Discussed in https://github.com/MikePopoloski/slang/discussions/1793

<div type='discussions-op-text'>

<sup>Originally posted by **acarlson1029** April 16, 2026</sup>
Currently in `SourceLoader.cpp` there is the following code for parallelizing the the parsing phase:

```cpp
        // Load all source files that were specified on the command line
        // or via library maps.
        pool->detach_loop(size_t(0), fileEntries.size(), [&](size_t i) {
            loadResults[i] = loadAndParse(fileEntries[i], optionBag, srcOptions, i);
        });
        pool->wait();
```

From the definition of `detach_loop`:

> Parallelize a loop by automatically splitting it into blocks and submitting each block separately to the queue, with the specified priority. The loop function takes one argument, the loop index, and it is called exactly once per index, but many times per block. Does not return a `BS::multi_future`, so the user must use `wait()` or some other method to ensure that the loop finishes executing, otherwise bad things will happen.

### Current behavior

This splits the files into fixed chunks that are parsed by each thread. For large file lists with large individual files it was observed that one thread became the long pole during parsing, leaving all of the other threads idle after they've finished their blocks.

See the trace generated as the baseline with a real file list. This ran with 64 threads and took 45 seconds. The typical performance depends on the distribution of files between the blocks. If one block has many large files (e.g. parsing generated code) it may take substantially longer.

[slang_trace_baseline_sized.json](https://github.com/user-attachments/files/26807103/slang_trace_baseline_sized.json)

### Updated behavior

This can be optimized by using `detach_sequence` instead.

> Submit a sequence of tasks enumerated by indices to the queue, with the specified priority. The sequence function takes one argument, the task index, and will be called once per index. Does not return a `BS::multi_future`, so the user must use `wait()` or some other method to ensure that the sequence finishes executing, otherwise bad things will happen.

This allows each thread to pull from the available tasks as they finish, preventing any thread from staying idle.

See the trace generated with this optimization enabled. This ran with 64 threads and took only 33s.

[slang_trace_optimized_sized.json](https://github.com/user-attachments/files/26807180/slang_trace_optimized_sized.json)

**NOTE:** I haven't benchmarked this with a small list of files to see if it causes a performance regression.</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel parsing doesn't balance workloads well when some files are much larger than others #1808

Discussed in #1793

Current behavior

Updated behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parallel parsing doesn't balance workloads well when some files are much larger than others #1808

Description

Discussed in #1793

Current behavior

Updated behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions