v18 fresh full audit: reusable AUDIT_CHECKLIST + 3 HF gate fixes#33
Open
lucapinello wants to merge 1 commit intochorus-applicationsfrom
Open
v18 fresh full audit: reusable AUDIT_CHECKLIST + 3 HF gate fixes#33lucapinello wants to merge 1 commit intochorus-applicationsfrom
lucapinello wants to merge 1 commit intochorus-applicationsfrom
Conversation
Ran a from-scratch audit with nothing cached or precomputed — fresh
notebook re-execution, selenium IGV rendering (so the client-side JS
actually loads), CDF sanity across all 6 oracle normalizers, device
detection across all 6 oracle envs on macOS arm64, and a trace of the
HuggingFace gate path.
## Main deliverable
`audits/AUDIT_CHECKLIST.md` — a 12-section reusable checklist covering
Install → HF auth → GPU/device → CDFs → Python API → Notebooks → HTML
reports (incl. IGV) → MCP server → Error paths → Repo-wide consistency
→ Tests → Reproducibility. Every check has an exact command or
grep pattern. P0/P1/P2 severity on each item. Future audits should
walk this top-to-bottom and leave behind a populated
`audits/YYYY-MM-DD_vNN_<label>/` per the appendix.
## Fixes
The HF agent caught 3 live drifts in the AlphaGenome gate flow:
1. chorus/oracles/alphagenome.py:133 — the "no HF token" error pointed
at `huggingface.co/google/alphagenome`, but the real gated repo is
`google/alphagenome-all-folds` (README.md:631, README.md:926,
environments/README.md:105 all agree). A user clicking the link
would not find the license form. Fixed.
2. chorus/oracles/alphagenome_source/templates/load_template.py:49 —
the env-runner load path raised with no URL at all ("Set HF_TOKEN
or run huggingface-cli login"). Appended the alphagenome-all-folds
license URL so direct and env-runner paths give the same actionable
guidance.
3. chorus/oracles/alphagenome_source/alphagenome_metadata.py:4 —
module docstring said "AlphaGenome predicts 5,930+ human functional
genomic tracks"; the library reports 5,731 (matches v17 audit and
the notebooks after v16). Updated.
Also tightened tests/test_error_recovery.py:169 from a loose substring
match on `huggingface.co/google/alphagenome` to the full correct URL
`huggingface.co/google/alphagenome-all-folds` so the test catches
future drift in CI.
## What the audit found healthy
- Fresh notebook run: exit 0, zero errors, zero warnings. The ref-
allele warning that used to fire on cell 39 is gone (v17 PR #32 fix
flowed through).
- All 6 CDF normalizers load, sort-monotonic, signed% matches expected
semantics. Sei + LegNet auto-download from the HF dataset works.
- Device detection clean across TF / PyTorch / JAX envs on macOS arm64
(Metal / MPS detected correctly; zero hardcoded cuda:0).
- 5 selenium-rendered HTML reports: IGV tracks load, 0 browser-console
JS errors.
Full report at `audits/2026-04-21_v18_fresh_full_audit.md`.
Tests: 334 passed / 1 skipped (fast suite, 9m 11s).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 21, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A from-scratch audit with nothing cached or precomputed — fresh notebook re-execution, selenium IGV rendering (so the client-side JS actually loads tracks instead of a placeholder), CDF sanity across all 6 oracle normalizers, device detection across all 6 oracle envs on macOS arm64, and a trace of the HuggingFace gate that AlphaGenome users hit on first use.
Main deliverable —
audits/AUDIT_CHECKLIST.mdA reusable 12-section runbook for future audits. Every check has an exact command or
greppattern, and P0/P1/P2 severity tags so you know what blocks ship vs. what's polish:environment.yml,chorus setup,chorus genome download, per-oracle env existenceHF_TOKEN,whoami, license-repo URL consistency across 3 code pathsCUDA_VISIBLE_DEVICESrespectcreate_oracle,sequence_length,ModelNotLoadedError, ref-allele warninglist_oraclesspec syncAppendix: exactly what artefacts an audit should leave behind in
audits/YYYY-MM-DD_vNN_<label>/so the next auditor can diff mechanically.Fixes
The HF-gate agent caught 3 live drifts — all in the flow a first-time AlphaGenome user follows:
chorus/oracles/alphagenome.py:133— the "no HF token" error pointed athuggingface.co/google/alphagenome, but the real gated repo isgoogle/alphagenome-all-folds(matchesREADME.md:631,README.md:926,environments/README.md:105). A user clicking the link would not find the license form. Fixed.chorus/oracles/alphagenome_source/templates/load_template.py:49— env-runner load path raised with no URL at all. Appended thealphagenome-all-foldslicense URL so direct and env-runner paths give the same actionable guidance.chorus/oracles/alphagenome_source/alphagenome_metadata.py:4— module docstring said "5,930+ tracks"; library reports 5,731 (matches v17 audit and post-v16 notebooks). Updated.tests/test_error_recovery.py:169from a loose substring match to the full correct URL so the test actually catches this drift in CI.What the audit found healthy
huggingface.co/datasets/lucapinello/chorus-backgroundsworks.cuda:0in live code.Known issues NOT fixed here
<repo>/genomes/incore/globals.py:13. Should be user-overridable (CHORUS_GENOMES_DIR) and default to~/.chorus/genomes/.EnvironmentManager.install_chorus_primitivecan raise with empty stderr on install failure.chorus genome downloadauto-resumes after a stall.Each is worth its own focused PR.
Test plan
pytest tests/ --ignore=tests/test_smoke_predict.py -q→ 334 passed / 1 skipped (9m 11s)jupyter nbconvert --execute single_oracle_quickstart.ipynb→ exit 0, 0 errors, 0 warnings🤖 Generated with Claude Code