Skip to content

M0: LLM node-realization behind a dynamic admission gate (cyber pack) #260

@larstalian

Description

@larstalian

Part of the emergent-mode realization ladder (packs/cyber_webapp/DESIGN.md §9). The realization primitive every higher rung is built from.

Goal

Let an LLM realize a vulnerability handler — a varied implementation of a vuln class — inside the procedurally-generated graph/flag/structure, instead of rendering a fixed template. Procedural stays the architect; the verifier is the gate.

The dynamic admission gate (the core)

Today's admission is structural (check_feasibility: a graph path exists). An LLM-realized handler can be wrong, so it needs dynamic admission:

  1. Render the world with the LLM handler.
  2. Run the intended exploit for the class.
  3. Confirm the flag actually leaks — via consequence.detect_leak (the live oracle from feat(cyber): detect leaked secrets in responses, not just submitted flags #259).
  4. Confirm a benign request does not leak (not trivially solvable) — the shortcut probe.
  5. (optional) A computed control confirms the mechanic genuinely executes — the faithfulness probe.

Accept iff solvable-and-not-trivial; else reject / regenerate.

Why

Scope

  • PROCESS backing (the handler is Python in the existing app); no container needed (that is Implement a plain Backing.CONTAINER realizer for cyber_webapp #252 / M1).
  • The deterministic core (the admission gate) is unit-tested against hand-built handler variants — faithful → accept, trivial/broken → reject. The LLM-realization step is proven by a live run (no mocks).

Out of scope (later rungs)

Real fs/shell exec-effect faithfulness (rides the container, #252 / #202); multi-service + networking (#212, #235); k8s (#189).

Metadata

Metadata

Assignees

No one assigned

    Labels

    pack-cyberCyber pack workresearchExploratory / no near-term planroadmapTracked on the public roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions