Skip to content

Conversation

@AlienKevin
Copy link
Contributor

@AlienKevin AlienKevin commented Jan 16, 2026

This PR introduces parallel processing to the task instance gathering phase to significantly improve performance for large datasets and adds support for the gather phase in the Modal workflow.

Key Changes:

  • Parallel Gathering (swesmith/harness/gather.py):
    Before this PR, large repos like math.js with >800 task instances timed out after 20 minutes due to slow, sequential git branch creation and git push. After this PR, large repos finish in minutes.
    • Implemented ProcessPoolExecutor to process task instances in parallel, utilizing multiple cores.
    • Added unique, PID-based clone paths (e.g., repo_name_pid_subfolder) to prevent race conditions during concurrent Git operations.
    • Refactored the main loop into a process_instance worker function.
  • Modal Support (scripts/bug_gen_modal.py):
    • Support task instance gathering with a --gather CLI flag (skipping generation/validation).

Question: do we want to fix the FAIL_TO_PASS to PASS_TO_FAIL?:
swe-smith currently uses FAIL_TO_PASS for tests that pass before the bug patch but fails afterwards, which inverts the semantic and causes confusion. A more intuitive name would be PASS_TO_FAIL so I used this convention in this PR. However, if we are to adopt this new convention, the rest of the code and datasets need to be updated, so I'm not sure whether it's worth it?

Resolution: Flip PASS_TO_FAIL to FAIL_TO_PASS in alignment with SWE-bench naming convention when outputing the task instance jsons.

Test command

uv run modal run scripts/bug_gen_modal.py --language javascript --gather &> gather.log

AlienKevin and others added 8 commits January 16, 2026 08:56
Previously, the script would fail if `git commit` was attempted with no changes. This was observed in cases like `Automattic__mongoose.5f57a5bb` where the applied patch resulted in no tracked changes. Now, we check `git status --porcelain` before committing and skip the instance if no changes are detected.
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
swesmith/profiles/base.py 76.73% <100.00%> (-2.29%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Switch from per-task clones to per-worker persistent repositories.
- Reduces clone operations from O(tasks) to O(workers) (e.g. 1400 -> 17).
- Eliminates file locking race conditions.
- Total gather time for Javascript is now ~5 minutes (bottlenecked by math.js).
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 8cefa3c to e42a5e2 Compare January 17, 2026 07:35
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 98898a8 to 302b73e Compare January 17, 2026 07:39
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 52ba582 to f7f68cb Compare January 20, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant