Skip to content

SWE pack: generate worlds (not just import them) + curriculum #248

@larstalian

Description

@larstalian

The SWE pack MVP (#238, shipped in #247) can import a world from a SWE-bench-style instance, prove it's well-posed, and grade an agent's edits against the repo's held-out tests. This issue tracks the next step: generating SWE worlds ourselves instead of only importing them — plus a curriculum that makes them harder over time.

Everything below reuses the ontology, grader, and admission self-test that already ship. Only the source of the world changes. Full design: packs/swe/DESIGN.md ("World-sources" + "Implementation issue tree").

What already works (the MVP)

  • Import a SWE-bench-style world — from a bundled fixture or a cloned repo@commit — and grade an agent's edited tree against tests it never sees.
  • Two task shapes on one world: swe.fix (all-or-nothing repair) and swe.build (partial credit from unit tests, success gated by integration tests).
  • Sandboxed grading, a multi-turn run_tests tool, and the training reward seam.

What's next

  1. Injected worlds — generate the bug. Start from a green repo + test suite and automatically splice in a defect (an AST edit) so the base fails and a known fix passes — no scraped issues required. Mirrors the cyber pack's vuln injector.

    • Done when: an injected world passes the admission self-test (bare base fails, gold fix greens the suite), with the injected defect serving as the admission mutation.
  2. Curriculum — make them harder. Use the same evolve(...) / available_mutations seam the trading pack uses to grow harder defects, stricter suites, and thinner scaffolding.

  3. Authored worlds — let an LLM write them. An LLM writes a feature, its tests, and the reference fix; admission's self-test rejects the ill-posed ones. Builds on (1) plus an LLM backend (Add API-key LLM backends #188).

    • Done when: an authored world admits and is solvable by its own reference fix.

Related work (tracked elsewhere)


This takes over from #238 as the SWE pack's roadmap tracker now that the MVP has shipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededpack-otherNew / exploratory packsroadmapTracked on the public roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions