Parallelize AST extraction and rewrite Louvain clustering#3
Merged
Conversation
Two independent hot paths were dominating wall-clock time on large
repositories. Both showed the same symptom — long runtime with low CPU
utilization (extraction) or sustained single-core saturation
(clustering) — and both scaled poorly with node count.
---
1. Parallelize stage 2 (AST extraction) in PipelineRunner
The pipeline previously ran extraction with a sequential foreach:
foreach (var file in detectedFiles)
await extractor.ExecuteAsync(file, cancellationToken);
Each call performs file I/O, so the loop spent most of its time
awaiting the disk while a single core sat idle. Wall-clock time grew
linearly with file count.
Replaced with Parallel.ForEachAsync bounded by Environment.ProcessorCount.
Extractor only holds a read-only dictionary of language extractors and
returns a fresh ExtractionResult per call, so concurrent invocation is
safe.
Concurrency details:
- Results are collected into a ConcurrentBag<ExtractionResult> and
flattened into the existing list after the loop.
- processed / skipped counters use Interlocked.Increment.
- Verbose per-file failure messages are buffered into a
ConcurrentQueue<string> and flushed sequentially after the loop;
TextWriter is not thread-safe and interleaved writes from parallel
workers would corrupt output.
- Cancellation flows through ParallelOptions.CancellationToken.
Stage 2b (LLM-backed semantic extraction) is intentionally left
sequential — LLM providers throttle aggressive concurrency and that
stage needs a separate, lower parallelism cap if it's parallelized
later.
Expected impact: near-linear speedup on AST extraction up to disk
throughput, typically 4–10x on large codebases.
---
2. Rewrite Louvain community detection in ClusterEngine
Stage 4 ("Detecting communities") was the dominant cost on graphs
with thousands of nodes. The root cause was inside
CalculateModularityGain (and its subgraph twin): every gain evaluation
walked all N nodes and re-summed each one's edge weights:
foreach (var otherNode in graph.GetNodes())
otherDegree = graph.GetEdges(otherNode.Id).Sum(e => e.Weight);
That made a single gain evaluation O(N · Ē). The outer Louvain loop
calls it once per (node, neighboring-community) per iteration, so the
overall cost was effectively O(iterations · N² · Ē). On a 10k-node /
50k-edge graph this is billions of operations — exactly why CPU was
pinned and wall-clock high.
Rewrote both the main Louvain pass and SplitCommunity to use standard
incremental aggregates:
- Build a per-node weighted adjacency dictionary once at the start of
DetectCommunities (collapses parallel edges into a single
neighbor → summed-weight entry).
- Compute each node's weighted degree once.
- Maintain communityTotalDegree[c] incrementally: when a node moves
from a → b, subtract its degree from a and add it to b.
- For each move evaluation, walk only the moving node's own neighbors
to bucket weight per neighboring community in one pass — O(degree)
instead of O(N · Ē).
- Reuse the edgesToCommunity dictionary across iterations
(.Clear() instead of allocating).
The same shape is applied inside SplitCommunity, with a sub-adjacency
restricted to the induced subgraph for the oversized community.
The now-redundant CalculateModularityGain and
CalculateSubgraphModularityGain helpers are removed; the gain math is
inlined where it is used. The public CalculateModularity and
CalculateCohesion static methods are unchanged. The modularity gain
formula is preserved bit-for-bit so community assignments stay
equivalent to the previous implementation.
Total cost drops from O(iterations · N² · Ē) to O(iterations · E),
roughly an N× speedup on large graphs.
---
Validation
- All 573 unit tests in Graphify.Tests pass (10 ClusterEngine tests,
plus full suite).
- No public API changes.
- No behavioral changes to graph output (community ids may differ
only by the ordering tie-break already present in the original
implementation).
Owner
|
Validation update: I checked out this PR and ran the full solution tests locally (614 passed, 0 failed). I’m adding a focused regression test for the new Louvain weighted-adjacency/parallel-edge path, then re-running tests before final merge recommendation. |
Owner
|
Final update: I validated PR #3 end-to-end, then merged it.\n\n✅ Validation: full solution tests passed locally\n- Before merge: 615/615 passing\n- After merge on main: 615/615 passing\n\n✅ Follow-up hardening: I added a regression test for parallel-edge weighted-adjacency clustering behavior in ClusterEngineTests and pushed it to main in commit 64cd03c.\n\nThis PR is now merged/closed and main is green from local verification. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two independent hot paths were dominating wall-clock time on large repositories. Both showed the same symptom — long runtime with low CPU utilization (extraction) or sustained single-core saturation (clustering) — and both scaled poorly with node count. ---
foreach (var file in detectedFiles)
await extractor.ExecuteAsync(file, cancellationToken);
Each call performs file I/O, so the loop spent most of its time awaiting the disk while a single core sat idle. Wall-clock time grew linearly with file count.
Replaced with Parallel.ForEachAsync bounded by Environment.ProcessorCount. Extractor only holds a read-only dictionary of language extractors and returns a fresh ExtractionResult per call, so concurrent invocation is safe.
Concurrency details:
Expected impact: near-linear speedup on AST extraction up to disk throughput, typically 4–10x on large codebases.
foreach (var otherNode in graph.GetNodes())
otherDegree = graph.GetEdges(otherNode.Id).Sum(e => e.Weight);
That made a single gain evaluation O(N · Ē). The outer Louvain loop calls it once per (node, neighboring-community) per iteration, so the overall cost was effectively O(iterations · N² · Ē). On a 10k-node / 50k-edge graph this is billions of operations — exactly why CPU was pinned and wall-clock high.
Rewrote both the main Louvain pass and SplitCommunity to use standard incremental aggregates:
CalculateSubgraphModularityGain helpers are removed; the gain math is inlined where it is used. The public CalculateModularity and CalculateCohesion static methods are unchanged. The modularity gain formula is preserved bit-for-bit so community assignments stay equivalent to the previous implementation.
Total cost drops from O(iterations · N² · Ē) to O(iterations · E), roughly an N× speedup on large graphs.
Validation