Skip to content

v19 checklist-driven fresh audit: CLAUDE.md + last 5,930/7,612 drifts#35

Closed
lucapinello wants to merge 1 commit intochorus-applicationsfrom
audit/2026-04-21-v19-checklist-driven
Closed

v19 checklist-driven fresh audit: CLAUDE.md + last 5,930/7,612 drifts#35
lucapinello wants to merge 1 commit intochorus-applicationsfrom
audit/2026-04-21-v19-checklist-driven

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Summary

First audit walked against the 18-section runbook in audits/AUDIT_CHECKLIST.md (shipped in PRs #33 + #34). Leaves behind the full artefact bundle the checklist appendix asks for.

Durable guidance — CLAUDE.md at repo root

New top-level CLAUDE.md tells every future Claude Code session to read audits/AUDIT_CHECKLIST.md first before any "ship-ready" pass. Walks should top-to-bottom against the checklist and drop artefacts in audits/YYYY-MM-DD_vNN_<label>/.

Fixes — the last two stale canonical numbers in live code

  1. chorus/oracles/alphagenome.py:22AlphaGenomeOracle docstring said "5,930 human functional genomic tracks"; real count is 5,731. Matches v16/v17/v18 fixes to notebooks, README, MCP server, metadata. This was the last 5,930 anywhere live.
  2. scripts/build_backgrounds_borzoi.py:4 — module docstring said "all 7,612 Borzoi tracks"; real count is 7,611. Matches v17 fix to scripts/README.md.

Checklist results (PASS unless noted)

§ Topic Result
3 GPU / device PASS — all 6 envs detect Metal/MPS on macOS arm64
4 Per-track CDFs PASS — monotonic + p50≤p95≤p99 + signed% match semantics for all 6 oracles
5 Python API PASSsequence_length matches spec; errors clear
7 HTML reports PASS18/18 shipped HTMLs render via selenium with 0 JS errors (16 screenshots committed, 2 basename collisions across dirs)
10 Repo consistency 2 drifts, fixed here
11 Test suite PASS — 334 passed / 1 skipped (9m)
13 Determinism (mock) PASS — real-oracle check deferred to release host
15 Offline PASS0 runtime CDN fetches across all 18 HTMLs (<script src="http…"> / <link href="http…"> greps empty; apparent "CDN refs" earlier were attribution comments inside bundled IGV.js)
16 Logging hygiene PASS — no committed HF tokens or AWS keys
18 License LICENSE present (MIT, Pinello Lab)

Known issues flagged, NOT fixed here

  • §18 no NOTICE / docs/THIRD_PARTY.md — 6 oracle models and bundled IGV.js should be attributed in one reachable place. P1 for release.
  • §17 pip-audit not in base env — add to environment.yml dev deps or run as release-host CI step.
  • §1 & §14 deferred to release-host audit — fresh install from clean machine + genomics edge cases (telomere, soft-masked, indels) need ~80 GB disk and hours of runtime; better run on the Linux/CUDA machine being shipped.

Artefacts in audits/2026-04-21_v19_fresh_audit/

  • report.md — this summary
  • screenshots/*.png (16) — selenium-rendered HTML reports at 1600×4500
  • cdf_check.txt, device_probe.txt, python_api.txt, determinism.txt, consistency_grep.txt, cdn_runtime.txt, pip_audit.txt, pytest.txt

Future audits diff against these.

Test plan

  • pytest tests/ --ignore=tests/test_smoke_predict.py -q334 passed / 1 skipped (9m)
  • Selenium-rendered all 18 HTMLs — 0 JS errors on each
  • §10 drift greps return only the 2 fixed-in-this-PR matches
  • §15 runtime CDN grep returns empty (reports are offline-safe)
  • grep '5,930\|7,612' --include='*.md' --include='*.py' -r . after fixes → empty (live code only; audits/ historical snapshots excluded)

🤖 Generated with Claude Code

First audit walked against the 18-section checklist shipped in PRs #33
+ #34. Artefacts land in audits/2026-04-21_v19_fresh_audit/ per the
checklist appendix: 16 selenium-rendered screenshots (18 HTMLs; 2
basename collisions across dirs), CDF sanity output, per-env device
probe, consistency greps, Python API probe, determinism check,
pytest log.

## CLAUDE.md at repo root

Points every future Claude Code session at audits/AUDIT_CHECKLIST.md
before any "ship-ready" pass, so the checklist doesn't drift by
memory. Also mentions the audits/ chronology for cumulative context.

## Fixes — the last two `5,930` / `7,612` in live code

1. chorus/oracles/alphagenome.py:22 — AlphaGenomeOracle docstring said
   "5,930 human functional genomic tracks"; real count is 5,731
   (matches v16/v17/v18 fixes to notebooks, README, server.py,
   metadata). This was the last 5,930 in the live codebase.

2. scripts/build_backgrounds_borzoi.py:4 — module docstring said
   "all 7,612 Borzoi tracks"; real count is 7,611 (matches v17 fix to
   scripts/README.md).

## Checklist results (PASS unless noted)

- §3 GPU/device matrix: all 6 envs detect Metal/MPS on macOS arm64.
- §4 CDF sanity: monotonic effect_cdfs + p50<=p95<=p99 + signed% all
  match expected semantics for all 6 oracles.
- §5 Python API: sequence_length matches spec for all 6; invalid
  oracle name gives helpful ValueError; ModelNotLoadedError clear.
- §7 HTML reports: 18/18 shipped HTMLs render via selenium with
  0 JS console errors. Screenshots committed.
- §10 consistency: 2 stale 5,930/7,612 drifts — fixed here.
- §11 pytest: 334 passed / 1 skipped (9m).
- §13 determinism (mock): PASS; real-oracle check deferred to release
  host.
- §15 offline: 0 runtime CDN fetches (greps for <script src="http…"
  and <link href="http…" empty across all 18 HTMLs).
- §16 logging hygiene: no committed HF tokens or AWS keys.
- §18 LICENSE present (MIT, Pinello Lab).

## Flagged, NOT fixed here

- §18 No NOTICE / docs/THIRD_PARTY.md attributing the 6 oracle models
  or the bundled IGV.js. P1 for release.
- §17 pip-audit not in base env. Add to environment.yml dev deps or
  run as release-host CI step.
- §1 & §14 (fresh install + genomics edge cases) need a release host
  with ~80 GB disk and hours of runtime — belong in the release audit.

Full report at audits/2026-04-21_v19_fresh_audit/report.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lucapinello
Copy link
Copy Markdown
Contributor Author

Superseded by other agent's v19/v20 work which landed most of the same fixes (5,731 docstring, CLAUDE.md, checklist additions) with a better CLAUDE.md. Cherry-picking the remaining one-line 7,612→7,611 fix into a new PR along with a bug fix for the §14.4 KeyError('H') finding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant