The project ships three domain adapters: pr_maintenance (any git repo), self_improve (CSIS itself), and lean_math (Lean formal math, with graceful fallback). A fourth or fifth would meaningfully extend what the system can actually be benchmarked against.
What needs to exist
A new file csis/domains/<name>.py exporting a class that implements three methods:
graders() -> GraderRegistry — V1 graders specific to the domain
curiosity() -> Curiosity — frontier-item generator for this domain
can_run() -> ReadyCheck — graceful fallback (e.g., return ReadyCheck(ready=False, reason='lean CLI not installed') if a dependency is missing)
Domain candidates
| Domain |
Why interesting |
| CTF puzzle solving |
Concrete benchmark with quantified scoring (each puzzle = 1 point) |
| Reverse engineering |
Builder produces analysis reports; Verifier checks against ground truth |
| Security audit |
Builder produces vulnerability reports; Verifier checks against known CVEs |
| Scientific literature |
Builder summarizes a paper; Verifier checks against the paper's actual abstract |
| Code review of public PRs |
A productized version of pr_maintenance against trending open-source PRs |
Acceptance criteria
- New domain module under
csis/domains/
- Plumbed into
csis/daemon.py:_select_domain() with a CLI flag
- At least one passing regression test
- README +
RUN.md updated with the new domain
- Domain-specific graders are pinned (cycle-1 F2 / cycle-2 P3 — pinned source-hash check at cert build)
Reference
csis/domains/pr_maintenance.py — simplest reference (~100 LOC)
csis/domains/lean_math.py — example with graceful fallback
Time estimate
~6-12 hours depending on domain complexity.
The project ships three domain adapters:
pr_maintenance(any git repo),self_improve(CSIS itself), andlean_math(Lean formal math, with graceful fallback). A fourth or fifth would meaningfully extend what the system can actually be benchmarked against.What needs to exist
A new file
csis/domains/<name>.pyexporting a class that implements three methods:graders() -> GraderRegistry— V1 graders specific to the domaincuriosity() -> Curiosity— frontier-item generator for this domaincan_run() -> ReadyCheck— graceful fallback (e.g., returnReadyCheck(ready=False, reason='lean CLI not installed')if a dependency is missing)Domain candidates
pr_maintenanceagainst trending open-source PRsAcceptance criteria
csis/domains/csis/daemon.py:_select_domain()with a CLI flagRUN.mdupdated with the new domainReference
csis/domains/pr_maintenance.py— simplest reference (~100 LOC)csis/domains/lean_math.py— example with graceful fallbackTime estimate
~6-12 hours depending on domain complexity.