What this is
Track Stages 2–4 of the webapp.build rigorous grader. Stage 1 landed in #224.
Where Stage 1 left it
Agent submits a handler source string into result.json; check_success runs it through a sandboxed subprocess against a held-out behavioral contract. Admission validates the contract is well-posed (clean reference passes; a bug-injecting mutation breaks it). Only the api service kind emits a build task; other kinds emit none. The loop is single-shot.
Remaining stages
- Stage 2 — expand the mutation/contract library. More service kinds beyond
api; richer behavioral contracts.
- Stage 3 — curriculum mutations on build. Wire
available_mutations for the build family so evolve(...) hardens build tasks (today enrichment is wired only on pentest's curriculum).
- Stage 4 — multi-turn loop. Live realizer reload + an agent test-runner tool so the agent iterates instead of one-shot submitting.
Where
packs/cyber_webapp/cyber_webapp/families/build/
packs/cyber_webapp/cyber_webapp/codegen/ (realizer reload for Stage 4)
Acceptance
Per stage, its own PR + acceptance. This issue is the umbrella for the staged rollout described in ROADMAP.md.
What this is
Track Stages 2–4 of the
webapp.buildrigorous grader. Stage 1 landed in #224.Where Stage 1 left it
Agent submits a handler source string into
result.json;check_successruns it through a sandboxed subprocess against a held-out behavioral contract. Admission validates the contract is well-posed (clean reference passes; a bug-injecting mutation breaks it). Only theapiservice kind emits a build task; other kinds emit none. The loop is single-shot.Remaining stages
api; richer behavioral contracts.available_mutationsfor the build family soevolve(...)hardens build tasks (today enrichment is wired only on pentest's curriculum).Where
packs/cyber_webapp/cyber_webapp/families/build/packs/cyber_webapp/cyber_webapp/codegen/(realizer reload for Stage 4)Acceptance
Per stage, its own PR + acceptance. This issue is the umbrella for the staged rollout described in ROADMAP.md.