Skip to content

webapp.build grader — Stages 2-4 (mutation library, curriculum, multi-turn) #237

@larstalian

Description

@larstalian

What this is

Track Stages 2–4 of the webapp.build rigorous grader. Stage 1 landed in #224.

Where Stage 1 left it

Agent submits a handler source string into result.json; check_success runs it through a sandboxed subprocess against a held-out behavioral contract. Admission validates the contract is well-posed (clean reference passes; a bug-injecting mutation breaks it). Only the api service kind emits a build task; other kinds emit none. The loop is single-shot.

Remaining stages

  • Stage 2 — expand the mutation/contract library. More service kinds beyond api; richer behavioral contracts.
  • Stage 3 — curriculum mutations on build. Wire available_mutations for the build family so evolve(...) hardens build tasks (today enrichment is wired only on pentest's curriculum).
  • Stage 4 — multi-turn loop. Live realizer reload + an agent test-runner tool so the agent iterates instead of one-shot submitting.

Where

  • packs/cyber_webapp/cyber_webapp/families/build/
  • packs/cyber_webapp/cyber_webapp/codegen/ (realizer reload for Stage 4)

Acceptance

Per stage, its own PR + acceptance. This issue is the umbrella for the staged rollout described in ROADMAP.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pack-cyberCyber pack workroadmapTracked on the public roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions