SWE pack: generate worlds (not just import them) + curriculum

The SWE pack MVP (#238, shipped in #247) can **import** a world from a SWE-bench-style instance, prove it's well-posed, and grade an agent's edits against the repo's held-out tests. This issue tracks the next step: **generating** SWE worlds ourselves instead of only importing them — plus a curriculum that makes them harder over time.

Everything below reuses the ontology, grader, and admission self-test that already ship. Only the *source* of the world changes. Full design: [`packs/swe/DESIGN.md`](https://github.com/vecna-labs/open-range/blob/main/packs/swe/DESIGN.md) ("World-sources" + "Implementation issue tree").

## What already works (the MVP)

- Import a SWE-bench-style world — from a bundled fixture or a cloned `repo@commit` — and grade an agent's edited tree against tests it never sees.
- Two task shapes on one world: `swe.fix` (all-or-nothing repair) and `swe.build` (partial credit from unit tests, success gated by integration tests).
- Sandboxed grading, a multi-turn `run_tests` tool, and the training reward seam.

## What's next

1. **Injected worlds — generate the bug.** Start from a green repo + test suite and automatically splice in a defect (an AST edit) so the base fails and a known fix passes — no scraped issues required. Mirrors the cyber pack's vuln injector.
   - *Done when:* an injected world passes the admission self-test (bare base fails, gold fix greens the suite), with the injected defect serving as the admission mutation.

2. **Curriculum — make them harder.** Use the same `evolve(...)` / `available_mutations` seam the trading pack uses to grow harder defects, stricter suites, and thinner scaffolding.
   - *Done when:* `evolve(...)` produces a strictly harder admitted world, with a success-rate curve demoed (#200).

3. **Authored worlds — let an LLM write them.** An LLM writes a feature, its tests, and the reference fix; admission's self-test rejects the ill-posed ones. Builds on (1) plus an LLM backend (#188).
   - *Done when:* an authored world admits and is solvable by its own reference fix.

## Related work (tracked elsewhere)

- Stronger sandboxing (seccomp / containers) for adversarial public traffic — #202.
- Large monorepos via lazy clone-at-realize — #212.
- Per-instance dependency / container images for real-repo test execution.

---

This takes over from #238 as the SWE pack's roadmap tracker now that the MVP has shipped.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWE pack: generate worlds (not just import them) + curriculum #248

What already works (the MVP)

What's next

Related work (tracked elsewhere)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SWE pack: generate worlds (not just import them) + curriculum #248

Description

What already works (the MVP)

What's next

Related work (tracked elsewhere)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions