Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# CLAUDE.md — project-specific guidance for Claude Code sessions

## Audit / release checklist

Before any "ship-ready" or "is this consistent?" pass on this repo, read
[`audits/AUDIT_CHECKLIST.md`](audits/AUDIT_CHECKLIST.md) first. It's an
18-section runbook covering install, docs, notebooks, HTML reports
(incl. IGV selenium render recipe), CDF/normalization, GPU/device,
HuggingFace auth, MCP, error paths, scientific determinism, genomics
edge cases, offline/air-gapped, logging hygiene, dependency supply
chain, license/attribution, and test suite. Every item has an exact
command or grep pattern, plus P0/P1/P2 severity.

Walk the checklist top-to-bottom for a fresh audit. Leave artefacts
(findings report, screenshots, fresh notebook outputs, CDF and device
probe logs) in `audits/YYYY-MM-DD_vNN_<label>/` per the appendix.

## Prior audit reports

`audits/*.md` — chronological snapshots of what was audited, found,
and fixed. Don't re-run an old audit; build on the most recent one and
cite the item number from the checklist.
16 changes: 16 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/cdf_check.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
2026-04-21 11:10:23,333 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'enformer': 5313 tracks, CDFs: effect_cdfs, summary_cdfs, perbin_cdfs
2026-04-21 11:10:26,288 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'borzoi': 7611 tracks, CDFs: effect_cdfs, summary_cdfs, perbin_cdfs
2026-04-21 11:10:26,309 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'chrombpnet': 24 tracks, CDFs: effect_cdfs, summary_cdfs, perbin_cdfs
2026-04-21 11:10:26,335 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'sei': 40 tracks, CDFs: effect_cdfs, summary_cdfs
2026-04-21 11:10:26,337 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'legnet': 3 tracks, CDFs: effect_cdfs, summary_cdfs
2026-04-21 11:10:27,614 - chorus.analysis.normalization - INFO - Loaded per-track CDFs for 'alphagenome': 5168 tracks, CDFs: effect_cdfs, summary_cdfs, perbin_cdfs
§4 CDF sanity:
oracle | n_tracks | CDFs | mono | p50<=p95<=p99 | signed%
-------------------------------------------------------------------------------------
enformer | 5313 | effect, summary, perbin | OK | OK | 0%
borzoi | 7611 | effect, summary, perbin | OK | OK | 20%
chrombpnet | 24 | effect, summary, perbin | OK | OK | 0%
sei | 40 | effect, summary | OK | OK | 100%
legnet | 3 | effect, summary | OK | OK | 100%
alphagenome | 5168 | effect, summary, perbin | OK | OK | 13%
RESULT: PASS
8 changes: 8 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/cdn_detail.txt

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/cdn_runtime.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
=== §15 actual runtime CDN fetches (script src / link href) ===
(if empty: all JS/CSS is bundled — reports work offline ✓)
19 changes: 19 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/consistency_grep.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
=== §10 canonical-number drift (markdown/py only) ===
./scripts/build_backgrounds_borzoi.py:4:perbin) per track for all 7,612 Borzoi tracks: CAGE, RNA, DNASE, ATAC,
./chorus/oracles/alphagenome.py:22: AlphaGenome (Google DeepMind, Nature 2026) predicts 5,930 human functional

=== §10 LegNet 230 bp drift ===

=== §10 old applications/ path in live docs ===

=== §15 CDN refs in shipped HTML (should be bundled only) ===
https://github.com/igvteam/igv.js
https://igv.org/genomes/genomes.js
https://vanilla-picker.js

=== §16 leaked HF/AWS tokens ===

=== §18 license files at repo root ===
MIT License

Copyright (c) 2024 Pinello Lab
3 changes: 3 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/determinism.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
2026-04-21 11:13:10,913 - chorus.core.base - INFO - Device: auto-detect (GPU if available, else CPU)
MockOracle determinism (same-seed): PASS
note: real-oracle determinism check belongs on a release host (takes 10+ min)
35 changes: 35 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/device_probe.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
--- §3 device probe ---
2026-04-21 11:10:29,236 - chorus.core.platform - INFO - Detected platform: Darwin arm64 (key=macos_arm64, cuda=False)
base platform: macos_arm64 has_cuda= False

== chorus-enformer ==
tf devs: ['/physical_device:CPU:0', '/physical_device:GPU:0']

== chorus-borzoi ==
torch cuda: False mps: True

== chorus-chrombpnet ==
tf devs: ['/physical_device:CPU:0', '/physical_device:GPU:0']

== chorus-sei ==
torch cuda: False mps: True

== chorus-legnet ==
torch cuda: False mps: True

== chorus-alphagenome ==
Platform 'METAL' is experimental and not all JAX functionality may be correctly supported!
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1776784238.684765 347709487 mps_client.cc:510] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported!
Metal device set to: Apple M3 Ultra
I0000 00:00:1776784238.695133 347709487 service.cc:145] XLA service 0x600002296600 initialized for platform METAL (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1776784238.695144 347709487 service.cc:153] StreamExecutor device (0): Metal, <undefined>
I0000 00:00:1776784238.696112 347709487 mps_client.cc:406] Using Simple allocator.
I0000 00:00:1776784238.696124 347709487 mps_client.cc:384] XLA backend will use up to 77308936192 bytes on device 0 for SimpleAllocator.
tf devs: ['/physical_device:CPU:0']
jax devs: ['METAL:0']
I0000 00:00:1776784238.748079 347709487 mps_client.h:209] MetalClient destroyed.

systemMemory: 96.00 GB
maxCacheSize: 36.00 GB

1 change: 1 addition & 0 deletions audits/2026-04-21_v19_fresh_audit/pip_audit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/var/folders/gx/xj1t8pkx1gvdn92fgw2g7vpm0000gn/T/mambafzp7780tnsh: line 5: exec: pip-audit: not found
34 changes: 34 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/pytest.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
........................................................................ [ 21%]
........................................................................ [ 42%]
........................................................................ [ 64%]
......s................................................................. [ 85%]
............................................... [100%]
=============================== warnings summary ===============================
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88
/Users/lp698/.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:88: PyparsingDeprecationWarning: 'parseString' deprecated - use 'parse_string'
parse = parser.parseString(pattern)

../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92
/Users/lp698/.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_fontconfig_pattern.py:92: PyparsingDeprecationWarning: 'resetCache' deprecated - use 'reset_cache'
parser.resetCache()

../../.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_mathtext.py:45
/Users/lp698/.local/share/mamba/envs/chorus/lib/python3.10/site-packages/matplotlib/_mathtext.py:45: PyparsingDeprecationWarning: 'enablePackrat' deprecated - use 'enable_packrat'
ParserElement.enablePackrat()

tests/test_utils.py::TestNormalizationUtils::test_quantile_normalize
/Users/lp698/.local/share/mamba/envs/chorus/lib/python3.10/site-packages/numpy/_core/numeric.py:442: RuntimeWarning: invalid value encountered in cast
multiarray.copyto(res, fill_value, casting='unsafe')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
334 passed, 1 skipped, 14 warnings in 513.48s (0:08:33)
15 changes: 15 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/python_api.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
§5 Python API sanity:
oracle | seq_length (expected) | match
-------------------------------------------------------
enformer | 393216 (393216) | ✓
borzoi | 524288 (524288) | ✓
chrombpnet | 2114 (2114) | ✓
sei | 4096 (4096) | ✓
legnet | 200 (200) | ✓
alphagenome | 1048576 (1048576) | ✓

--- invalid oracle name error ---
ValueError: Unknown oracle: bogus. Available: ['enformer', 'borzoi', 'chrombpnet', 'sei', 'legnet', 'alphagenome']

--- predict pre-load error ---
ModelNotLoadedError: Model not loaded. Call load_pretrained_model first.
70 changes: 70 additions & 0 deletions audits/2026-04-21_v19_fresh_audit/report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# v19 fresh audit — driven by `audits/AUDIT_CHECKLIST.md`

First audit walked top-to-bottom against the 18-section checklist shipped
in PRs #33 + #34. Artefacts in this directory match the checklist
appendix (`screenshots/`, `cdf_check.txt`, `device_probe.txt`,
`consistency_grep.txt`, `python_api.txt`, `determinism.txt`,
`pytest.txt`).

Also lands `CLAUDE.md` at repo root pointing every future Claude
session at the checklist so this doesn't drift by memory.

## Results by checklist section

| § | Topic | Result | Notes |
|---|---|---|---|
| 1 | Install & environment | **deferred** | Full fresh install needs a Linux/CUDA host + ~80 GB. |
| 2 | HuggingFace auth | **spot-checked** (code paths) | All 3 paths updated in v18. End-to-end HF gate test belongs on release host. |
| 3 | GPU / device | **PASS** | All 6 envs detect Metal / MPS on macOS arm64. See `device_probe.txt`. |
| 4 | Per-track CDFs | **PASS** | All 6 oracles: monotonic, p50 ≤ p95 ≤ p99, signed% matches semantics. See `cdf_check.txt`. |
| 5 | Python API | **PASS** | `sequence_length` matches spec for all 6. Error messages clear. See `python_api.txt`. |
| 6 | Notebooks fresh-run | **deferred** | v18 already ran `single_oracle_quickstart.ipynb` clean; the other two are long. |
| 7 | HTML reports | **PASS** | **18/18 shipped HTMLs** rendered via selenium with **0 JS errors**. See `screenshots/`. |
| 8 | MCP server | **spot-checked** | 22 tools confirmed in v17. E2E `chorus-mcp` over stdio deferred. |
| 9 | Error messages | **PASS** | Unknown-oracle, no-model, no-genome all tested in v17/v18. |
| 10 | Repo consistency | **2 drifts found** | Fixed in this PR. |
| 11 | Test suite | **PASS** | see `pytest.txt`. |
| 12 | Reproducibility | **PASS** | Regen scripts idempotent; CDFs auto-downloadable. |
| 13 | Scientific determinism | **PASS (mock)** | Real-oracle same-seed check belongs on release host. |
| 14 | Genomics edge cases | **deferred** | Needs loaded oracles. |
| 15 | Offline / air-gapped | **PASS** | **0 runtime CDN fetches** — `<script src="http…">` / `<link href="http…">` greps empty across all 18 HTMLs. Apparent "CDN refs" earlier were attribution comments inside the bundled IGV.js. |
| 16 | Logging hygiene | **PASS** | No committed HF tokens (`hf_…`) or AWS keys. |
| 17 | Supply chain | **partial** | `pip-audit` not installed in base env; documented as release-host check. |
| 18 | License / attribution | **P1: missing NOTICE / THIRD_PARTY** | `LICENSE` (MIT, Pinello Lab) present. No `NOTICE` or `docs/THIRD_PARTY.md` attributing the 6 oracle models. |

## Fixes in this PR

1. **`scripts/build_backgrounds_borzoi.py:4`** — module docstring said
"**7,612 Borzoi tracks**"; real count is 7,611 (matches v17 fix to
`scripts/README.md`). Updated.
2. **`chorus/oracles/alphagenome.py:22`** — `AlphaGenomeOracle`
docstring said "**5,930 human functional genomic tracks**"; real
count is 5,731 (matches v16/v17/v18 fixes to notebooks, README,
server.py, metadata). Updated. This was the **last** `5,930` in the
live code.

## Known issues flagged, NOT fixed here

- **§18 third-party attribution missing**: no `NOTICE` or
`docs/THIRD_PARTY.md`. The 6 oracle models (Enformer, Borzoi,
ChromBPNet, Sei, LegNet, AlphaGenome) and the bundled IGV.js should
be credited with licenses in one reachable place. P1 for release.
- **§17 pip-audit not installed**: add to `environment.yml` dev-deps,
or keep as a release-host CI step. The release CI workflow at
`.github/workflows/tests.yml` doesn't currently run supply-chain
scans.
- **§1 & §14 deferred to release-host audit**: fresh install from a
clean machine + genomics edge-cases (variant near telomere,
soft-masked FASTA, indels) need physical run time that doesn't fit
on this pass.

## Delivered

- `CLAUDE.md` at repo root — instructs future Claude sessions to read
`audits/AUDIT_CHECKLIST.md` before any ship-ready audit.
- 2 docstring fixes (`5,930`→`5,731`, `7,612`→`7,611`) — the last
stale canonical numbers in live code.
- `audits/2026-04-21_v19_fresh_audit/` — 16 HTML screenshots (18
reports; 2 pairs share basenames in different dirs), CDF sanity
output, per-env device probe, consistency greps, Python API probe,
determinism check, `pytest.txt`.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion chorus/oracles/alphagenome.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
class AlphaGenomeOracle(OracleBase):
"""AlphaGenome oracle with automatic environment management.

AlphaGenome (Google DeepMind, Nature 2026) predicts 5,930 human functional
AlphaGenome (Google DeepMind, Nature 2026) predicts 5,731 human functional
genomic tracks at single base-pair resolution from up to 1 MB of DNA
sequence using a JAX-based model.

Expand Down
2 changes: 1 addition & 1 deletion scripts/build_backgrounds_borzoi.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Build per-track background distributions for Borzoi.

Produces ``borzoi_pertrack.npz`` with three CDF matrices (effect, summary,
perbin) per track for all 7,612 Borzoi tracks: CAGE, RNA, DNASE, ATAC,
perbin) per track for all 7,611 Borzoi tracks: CAGE, RNA, DNASE, ATAC,
CHIP-TF, CHIP-Histone.

RNA-seq tracks use **exon-precise sampling**: only bins overlapping
Expand Down
Loading