You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SWE pack MVP (#238, shipped in #247) can import a world from a SWE-bench-style instance, prove it's well-posed, and grade an agent's edits against the repo's held-out tests. This issue tracks the next step: generating SWE worlds ourselves instead of only importing them — plus a curriculum that makes them harder over time.
Everything below reuses the ontology, grader, and admission self-test that already ship. Only the source of the world changes. Full design: packs/swe/DESIGN.md ("World-sources" + "Implementation issue tree").
What already works (the MVP)
Import a SWE-bench-style world — from a bundled fixture or a cloned repo@commit — and grade an agent's edited tree against tests it never sees.
Two task shapes on one world: swe.fix (all-or-nothing repair) and swe.build (partial credit from unit tests, success gated by integration tests).
Sandboxed grading, a multi-turn run_tests tool, and the training reward seam.
What's next
Injected worlds — generate the bug. Start from a green repo + test suite and automatically splice in a defect (an AST edit) so the base fails and a known fix passes — no scraped issues required. Mirrors the cyber pack's vuln injector.
Done when: an injected world passes the admission self-test (bare base fails, gold fix greens the suite), with the injected defect serving as the admission mutation.
Curriculum — make them harder. Use the same evolve(...) / available_mutations seam the trading pack uses to grow harder defects, stricter suites, and thinner scaffolding.
Authored worlds — let an LLM write them. An LLM writes a feature, its tests, and the reference fix; admission's self-test rejects the ill-posed ones. Builds on (1) plus an LLM backend (Add API-key LLM backends #188).
Done when: an authored world admits and is solvable by its own reference fix.
The SWE pack MVP (#238, shipped in #247) can import a world from a SWE-bench-style instance, prove it's well-posed, and grade an agent's edits against the repo's held-out tests. This issue tracks the next step: generating SWE worlds ourselves instead of only importing them — plus a curriculum that makes them harder over time.
Everything below reuses the ontology, grader, and admission self-test that already ship. Only the source of the world changes. Full design:
packs/swe/DESIGN.md("World-sources" + "Implementation issue tree").What already works (the MVP)
repo@commit— and grade an agent's edited tree against tests it never sees.swe.fix(all-or-nothing repair) andswe.build(partial credit from unit tests, success gated by integration tests).run_teststool, and the training reward seam.What's next
Injected worlds — generate the bug. Start from a green repo + test suite and automatically splice in a defect (an AST edit) so the base fails and a known fix passes — no scraped issues required. Mirrors the cyber pack's vuln injector.
Curriculum — make them harder. Use the same
evolve(...)/available_mutationsseam the trading pack uses to grow harder defects, stricter suites, and thinner scaffolding.evolve(...)produces a strictly harder admitted world, with a success-rate curve demoed (Curriculum-driven training demo #200).Authored worlds — let an LLM write them. An LLM writes a feature, its tests, and the reference fix; admission's self-test rejects the ill-posed ones. Builds on (1) plus an LLM backend (Add API-key LLM backends #188).
Related work (tracked elsewhere)
This takes over from #238 as the SWE pack's roadmap tracker now that the MVP has shipped.