Cyber: self-verifying generation + real-container backing (leak oracle, LLM admission)#266
Merged
Merged
Conversation
…ation indictment Gym change — the §8.3 "any HIDDEN value leaked" consequence oracle, wired live: - consequence.py: detect_leak / guarded_values over HIDDEN value_ref nodes — the independent leak verifier, length-floored so a short value can't false-positive. - codegen bakes guarded_values(graph) into seed.json; the rendered app scans each response boundary and logs leaked node IDS only (never the secret value); realize surfaces final_state["leaked_secret_ids"]; check_success consumes it in `reason`. - Boundary held: success/subgoals — and thus the trainer's averaged reward rungs — are unchanged. Rewarding on the leak is a trainer-coordinated follow-up (#198). Research tooling — experiments/indictment: - An independent harness running LLM-generated worlds to measure the admit-gap: worlds a self-verifying loop ships that independent probes reject as trivial or unfaithful. 89 worlds across 4 classes; gap is a small, consistent ~2-4%. The harder finding: a reliable independent verifier is itself hard to build — false negatives AND positives, each found by hand-auditing. See RESULTS.md. DESIGN.md §8 documents the verification ladder, the spine, and the honest results (incl. the raw-substring oracle's known encoded-leak limitation). Live-validated end to end (a real SQLi episode records the leak by node id). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#259 review) The adversarial review of the §8.3 spine flagged two real (latent) limitations in the raw-substring oracle; close them: - Encoded exfil: detect_leak and the rendered app scanner now search for each guarded value AND its cheap reversible encodings (base64 / hex / percent) by encoding the needle, so an encoded leak is caught, not only the literal form. gzip / binary / multibyte splits remain out (documented). - Containment: detect_leak drops a guarded value that is a proper substring of another leaked value (offline / grader path; the live per-response signal stays raw, since the scanner logs ids not values). An agreement test pins the rendered app scanner to consequence.value_variants so live and offline verdicts can't drift. DESIGN.md §8.4 corrected (no longer "deferred"). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
experiments/ (the indictment harness + 89 generated test worlds + writeup) was a one-off validation run, not gym/pack code — it doesn't belong in the repo. Its findings stay documented in DESIGN.md §8.10 as prose; the pack verifier (consequence.py) is unaffected — it lives in the pack, where it belongs. The coupled tests/test_indictment_harness.py goes with it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop references that .rules forbids in code comments: a section pointer
("§8.3 spine"), a forward/"future" reference ("emergent mode"), and a docstring
naming a specific test (rot-prone). The remaining comments are non-obvious WHY.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) The realization primitive of the emergent-mode ladder (DESIGN.md §9). Today's admission is structural (a graph-path check); an LLM-realized handler can be wrong, so it is admitted DYNAMICALLY — run the intended exploit + a benign request and let the consequence verifier decide: the exploit must leak the flag, the benign request must not. Accept iff solvable and not trivial. - realize_admit.py (pack): the pure pieces — classify_admission (the verdict, over consequence.detect_leak) + cmdi_exploit_and_benign (the per-class exploit oracle). Running an episode is a host concern (packs must not import openrange), so the orchestration lives with the caller, not the pack. - codegen: a vuln node's realized_handler stands in for the template — the hook the LLM realization writes through. - DESIGN.md §9: the M0->M3 realization ladder (procedural architects, LLM realizes, admission verifies, freeze), mapping M1/M2/M3 to #252 / #212 / #235 / #189. Validated end to end: a faithful command_injection world is accepted; trivial and not-solvable verdicts are exercised. 100% branch coverage; import-boundary clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CI runs mypy over tests/; the rendered-app exec helper retrieved values from a dict[str, object] namespace, so calling them tripped "object not callable". Type the namespace dict[str, Any] so the pulled-out functions are callable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Inject a stand-in "realized" handler into a command-injection world and run it through codegen -> runtime -> the admission gate: - a different-but-real handler (splits on ';', cats the file) is ACCEPTED — the gate lets in varied implementations, which is the diversity M0 is for; - a handler that returns the flag on any request is REJECTED as trivial; - a handler that never leaks is REJECTED as not solvable. Confirms the codegen hook and the gate work together on a live episode. The live LLM-writes-the-handler step plugs in on top (non-deterministic, so a demo not a test). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dex harness) Drives the codex LLM backend to write a command-injection handler, injects it, and runs it through the dynamic admission gate (accept iff the exploit leaks the flag and a benign request does not). Accepted handlers are the model's own varied implementations; trivial or broken ones are rejected — the reusable entry point for autonomous LLM realization, the live step on top of M0's already-tested gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A second LLMBackend alongside CodexBackend, driving `claude -p --output-format json`. Claude has no output-schema flag, so a structured request asks for a JSON object in the prompt and parses it out of the reply (bare or ```-fenced). Useful where codex is unavailable (quota-limited) or declines a task it flags as risky — claude authors the cyber gym's handlers that codex won't. examples/cyber_realize now selects the backend (--backend claude|codex, default claude). Ran live: claude wrote 5 distinct command-injection handlers, all 5 ACCEPTED by the dynamic admission gate (exploit leaks the flag, a benign request does not) — the M0 realization loop closed end to end with a real model. Fake-CLI tests (no mocks) cover parsing, fenced JSON, flag passing, failures, timeouts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
container.image_files packages a world's rendered app into a Docker build context (Dockerfile + app.py + seed.json). A docker-gated test proves the real path end to end: build the image, run the container, and recover the flag by exploiting the world over HTTP. This containerizes the existing in-memory app — the runtime foundation for Backing.CONTAINER. Making the exploits hit the container's real fs/shell, and wiring this in as the Backing.CONTAINER runtime, are the next M1 increments. Caveat: the seed (with the flag) is COPYed into the image for now (an image layer until the app unlinks it at startup); mounting it at run time is the follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…er (#252) At CONTAINER backing, container.realfs_cmdi_app builds a stdlib real-shell app: the injected input runs through a real `sh -c` against the real filesystem, and the flag is a real file (written from the OPENRANGE_FLAG env var at startup, never in an image layer). So `; cat <path>` actually executes — genuine RCE/file-read, not the in-memory emulation. A docker-gated test proves it: build, run with the flag env, and a real `cat` recovers the real file's flag over HTTP; a benign request does not. Scope: command_injection, plain `; cat` injection. Re-applying the mutually-exclusive contexts of §6 over the real shell, and a ContainerRuntime that selects this per class via Backing.CONTAINER, are the next increments. Real RCE runs inside the container sandbox; hardened isolation is #202. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The container backing's real-shell handler now applies the same naive,
context-specific filter the in-memory emulation uses, so the three
mutually-exclusive injection contexts (separator / substitution / quoted)
survive the move from emulation to a REAL `sh -c`:
- separator strips $()/backticks, keeps ; | &
- substitution strips ; | & newline, keeps $()/backticks
- quoted wraps the arg in QUOTE + strips $(); the exploit must
break the quote
A docker-gated, parametrized test proves it end to end: a world built for
one context is exploited by THAT context's payload and is NOT exploited by
another context's payload (the wrong vectors are filtered before sh). This
carries the §6 validity work from the in-memory path to Backing.CONTAINER.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Generalize the CONTAINER backing past command_injection by putting the "real" mode into the ONE generated multi-service app — not another bespoke per-class app. The container sets OPENRANGE_REALFS; the rendered app then serves its `files` surface from a real filesystem (`_RealFiles`, a real open() per path) instead of the in-memory dict. The PROCESS backing never sets the env and stays byte-for-byte the in-memory emulation. This makes the whole file-read shape genuinely real with ZERO handler changes: a path-traversal escape is real OS path resolution against the container fs, and the cmdi readers `cat` real files. Proven by a docker- gated, parametrized test across the three confinement contexts (absolute_only / relative / dotdot_filter): each world is read only by its own technique's payload and neutralizes the others — the same mutually-exclusive-contexts guarantee the in-memory emulation makes, now holding over a real fs. The generated app is also the surface the next milestone containerizes and reads the request-log leak signal from, so this is foundation, not a throwaway. The stdlib image_files_realfs variant remains the standalone real-`sh -c` proof for code_exec until that folds into the generated app too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oke variant (M1) Complete the CONTAINER unification: command_injection now runs a real `sh -c` inside the ONE generated multi-service app under the same OPENRANGE_REALFS gate, so the container is a single app with every shape real — no parallel per-class app. The bespoke `image_files_realfs` / `realfs_cmdi_app` stepping stone is removed. - command_injection.py.j2 gains a real-shell branch (gated by OPENRANGE_REALFS): the same naive, context-specific §6 filter, then a real `sh -c`. PROCESS leaves the env unset and stays the in-memory emulation byte-for-byte. - The image installs the diagnostic tools base_command samples from (ping/nslookup/dig/host/traceroute) so the real shell behaves like a real vulnerable endpoint: a chained `; cat` reads the flag, and `$(cat flag)` leaks it too because each tool echoes the flag-as-hostname in its resolver error. This is the faithfulness the bespoke app hid by hardcoding `echo`. - container.py drops the bespoke variant + its now-unused imports; the docker-gated cmdi tests retarget onto `image_files` (the generated app), still proving the three §6 contexts mutually exclusive over a real shell. - DESIGN.md §9 M1-status updated: file_read + code_exec both real on the one app; remaining M1 is ssti-unsandboxed then isolation (#202). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A world is the target the agent attacks over HTTP — not the agent's toolbox. So its image should carry only what its OWN vulns run server-side, not a diagnostic toolkit every world drags along. Replace the unconditional 5-tool apt-install with `required_apt_packages(graph)`: it returns only the apt packages the world's command_injection vulns need (base_command → package, union across vulns), and `_dockerfile()` skips the apt layer entirely when the set is empty. A path-traversal / SQLi-only world now builds a lean image with no OS tools; a cmdi world installs only its own base_command's tool (e.g. `dnsutils` for nslookup/dig/host). The base_command tool belongs in the TARGET container because the server runs it as the vulnerability — confirmed against the codebase's world/agent split (`base_url` = the world, `solver_root` = the agent's own workspace; "bring your own agent harness"). The agent's recon/exploit tooling lives in its separate sandbox, not the world image. DESIGN.md §9 gains a plain-language "Two environments, not one" note. Tests: required_apt_packages scopes to the world's tool (empty for file-read); the Dockerfile installs OS tools only when needed; a lockstep guard asserts every sampled base_command maps to a package (else a cmdi world would silently ship without its tool). Verified empirically that all five diagnostic tools reflect their argument on python:3.13-slim, so scoping never breaks the substitution exploit. Docker-gated cmdi + path_traversal tests still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The CONTAINER backing runs attacker-controlled code (a real `sh -c`, a real filesystem), so contain it. `hardening_run_args()` returns the `docker run` flags — `--cap-drop=ALL`, `--security-opt=no-new-privileges`, and memory / cpu / pids caps — so an exploit can't escalate, fork-bomb, or exhaust the host. It's a reusable building block the #252 CONTAINER runtime will run with; the docker tests now run every world through it. A docker-gated test proves the containment is real, not just configured: `docker inspect` shows the caps dropped + limits set, and `cat /proc/self/status` inside the container shows CapEff all-zero. Crucially the world stays exploitable over HTTP under the flags (the DNS-resolution leak and the `cat` chain need no capabilities), so containment doesn't break the vuln. This is task 1 of #265. Read-only-rootfs + egress-blocking + flag-out-of-image remain there (read-only needs the app's writable-path rework; egress rides the M2 network rung). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code comments/docstrings shouldn't carry roadmap-phase or task tags (M0/M1, DESIGN.md §-refs, issue numbers) — they rot, and commits/PRs are the place for that context. Strip them from the new code, keeping the WHY. Design-doc references stay in DESIGN.md, not in the code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin hardening_run_args' contract and required_apt_packages' defensive branches (non-mapping params, unmapped base_command) with non-docker unit tests, so container.py hits 100% branch coverage without needing docker. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The container code was behind a NotImplementedError — episodes couldn't reach it. Wire it: ContainerWebappRuntime runs the world as a real Docker container, and WebappPack.realize() routes CONTAINER to it. It reuses SubprocessRuntime by treating `docker run` (foreground) as the supervised child — the container's app prints the same startup line a local subprocess would, the published host port is resolved with `docker port`, and the request log is read out of the running container (`docker exec cat`). A `_read_log_bytes()` seam shares all the existing log/surface/collect logic between the two backings; PROCESS stays byte-for-byte the same. Load-bearing parity test: the SAME snapshot + SAME exploit grades identically on PROCESS (in-memory emulation) and CONTAINER (a real shell in a real container) — only fidelity changes, not the task surface. Plus unit tests for the backing routing, the construct-without-docker path, and image reuse across resets. New code is at 100% branch coverage (one container-gone guard pragma'd). Scope: one container for the whole world. Multiple per-service containers on a real network is the networked-services work (#212 / #235), not this. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5 tasks
CONTAINER is now wired, so it no longer surfaces NotImplementedError. Prove the backing selector reaches pack.realize with a still-unwired backing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ling branch #259 landed on main as a squash of this branch's early leak-oracle + admission work. This branch already contains all of it (consequence.py is byte-identical) plus the M0/M1/container/runtime work built on top, so -s ours keeps the superset tree and records main as merged. Verified main has no unique content: every line it has that this branch lacks is a superseded older version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Makes the cyber gym self-verifying and transfer-real, in layers that build on each other. (The leak/consequence oracle from #259 is already on main; this PR builds on it.)
LLM realization behind a dynamic admission gate (#260). The LLM can write a vuln handler; we don't trust it. We render it into a procedurally-built world, run the intended exploit and a benign request, and let the consequence oracle decide: the exploit must leak the flag, the benign request must not. Accept iff solvable-and-not-trivial. Driven by a new
ClaudeBackendfor theclaudeCLI, since codex declines the cyber task.A real-container backing, wired as a runtime (#252). The same generated app the in-memory
PROCESSbacking runs, but as a real container that episodes actually use (ContainerWebappRuntime, selected byBacking.CONTAINER), withOPENRANGE_REALFSset so surfaces go real:open()against a real filesystem — a traversal escape is real OS path resolution, not a dict lookup;sh -c, with the three mutually-exclusive injection contexts preserved;It reuses the subprocess runtime (
docker runis the supervised child), resolves the published host port withdocker port, and reads the leak signal out of the running container. It's all additive — thePROCESSbacking stays byte-for-byte the same.Why
The cyber gym's value is bounded by its verifier; a self-verifying loop will ship trivial or unfaithful worlds unless an independent consequence-verifier rejects them. This builds the realization gate and the container backing it runs on — moving generation toward the LLM (variety, scale) while keeping correctness with procedural + admission. Design in
packs/cyber_webapp/DESIGN.md§8 (the verifier) and §9 (the staged plan: process → container → networked → cluster, each stage tracked by its own issue).Testing
strandsextra).trl.GRPOTrainerGRPO step over a live SWE and cyber world (HTTP tools) — both pass (OPENRANGE_LIVE_TRL=1).PROCESSandCONTAINER— only fidelity changes, not the task surface.CapEffall-zero inside, still exploitable under the flags).test_trl_cyber); new modules at 100% branch coverage; no mocks — real subprocesses (docker, fake-CLI scripts), real HTTP, real episodes.Scope / deferred (tracked)
Backing.CONTAINERrealizer for cyber_webapp #252 — remaining: one container per service node on a real network (multi-service); this PR does one container for the whole world.Notes
origin/main(the #259 leak-oracle squash) is merged in — this branch already contained that work (consequence.pyis byte-identical) plus everything built on top, so the diff collapses to the net-new M0/M1/container/runtime work. Follows.rules(integration tests, no mocks, comments WHY-only, no roadmap/phase tags in code). The strayopen-range.zipis untracked and not part of this PR.