UX consistency: multi-oracle + causal reports use enriched CHIP labels + shared percentile format#27
Open
lucapinello wants to merge 89 commits intomainfrom
Open
UX consistency: multi-oracle + causal reports use enriched CHIP labels + shared percentile format#27lucapinello wants to merge 89 commits intomainfrom
lucapinello wants to merge 89 commits intomainfrom
Conversation
…und distributions - New chorus/analysis module: multi-layer scoring (scorers.py), variant reports, quantile normalization, batch scoring, causal prioritization with enriched HTML tables (gene, cell type, per-layer score columns, top-3 IGV signal tracks), cell type discovery, region swap, integration simulation - New chorus/analysis/build_backgrounds.py: variant effect and baseline signal background distributions for quantile normalization, with batch GPU scripts - 8 application examples with full outputs: variant analysis (SORT1, TERT, BCL11A, FTO across AlphaGenome/Enformer/ChromBPNet), causal prioritization, batch scoring, cell type discovery, sequence engineering (region swap + integration simulation) - Validation against AlphaGenome paper: SORT1 confirmed, TERT partially confirmed (ELF1 limitation documented), HBG2 not reproduced in K562 or monocytes (ISM vs log2FC methodology difference documented with side-by-side comparison) - Fix mamba PATH resolution in environment runner and manager - Add gene_name, cell_type fields to CausalVariantScore and BatchVariantScore - 500 common SNPs BED file for background computation - 91 tests covering all analysis components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The annotation module was re-parsing the full 1GB GENCODE GTF file on every call to get_genes_in_region, get_gene_tss, and get_gene_exons (~11s each). Now the GTF is loaded once per feature type (gene/transcript/exon) and cached as a DataFrame for the process lifetime. Exon lookups use a groupby index for O(1) gene-name access. Before: 11,000ms per query (full GTF scan) After: 0.03s genes, 0.04s TSS, 1.5ms exons (cached) Full analysis test suite now completes in 2 min (was timing out at 10+ min). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ores Single-process AlphaGenome script that extracts all 3,763 valid tracks (711 cell types × 6 output types) from each forward pass. Same GPU time as K562-only (~55 min total on A100) but yields comprehensive per-layer distributions across all cell types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive background distribution builder: - 10K random SNPs from hg38 reference across all autosomes - 20K protein-coding gene TSS positions (promoter state baselines) - 5K random genomic positions (general baseline) - Parallel GPU execution: --part variants --gpu 0 / --part baselines --gpu 1 - All 3,763 AlphaGenome tracks extracted per forward pass Expected output: ~37M variant scores + ~94M baseline samples in ~18 hours. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n counting Covers all AlphaGenome output types: - Window-based: DNASE, ATAC, CHIP_TF, CHIP_HISTONE, CAGE, PROCAP, SPLICE_SITES, SPLICE_SITE_USAGE - Exon-counting: RNA_SEQ (sum across merged protein-coding exons per gene) - All backgrounds unsigned (abs magnitude) for quantile ranking Pre-loads GENCODE v48 gene annotations and builds spatial index for fast exon lookup within prediction windows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…haul
Analysis framework:
- PerTrackNormalizer with per-track CDFs (effect, activity, perbin) for all 6 oracles
- Auto-download backgrounds from HuggingFace on oracle load
- AnalysisRequest dataclass preserves user's original prompt on every report
- Magnitude-gated interpretation labels ("Very strong" requires |effect| > 0.7)
- Top-10-per-layer cap in markdown reports with truncation footer
- Biological interpretation + suggested next steps on all 14 example outputs
- Literature caveats where oracle predictions diverge from published biology
Bug fixes:
- Sequence.slice() missing self argument in interval.py (broke Enformer predictions)
- oracle_name="oracle" placeholder in region_swap, integration, discovery
- Cell-type column bloat in batch_scoring and causal (thousands of cell types)
- Corrupted igv.min.js in 15 HTML files from misplaced injection into JS string
- predict() called with string region instead of (chrom, start, end) tuple
MCP server:
- All 8 critical tools accept user_prompt and forward it into reports
- _safe_tool decorator returns structured {"error", "error_type"} on failure
- Improved docstrings: score_variant_batch (variant dict schema), discover_variant_cell_types (runtime + cell count), fine_map_causal_variant (composite formula + output columns)
- Causal table shows Top Layer column; batch scoring resolves track IDs to human-readable names
Application examples (14 folders, all with MD/JSON/TSV/HTML):
- Regenerated all variant_analysis, validation, discovery, causal, batch, sequence_engineering
- Every report has Analysis Request header + Interpretation section
- Cleaned stale intermediate files (5 removed)
- IGV browser verified working in headless Chrome
Documentation:
- README: "Start here" applications callout, updated MCP tools list, MCP walkthrough link
- API_DOCUMENTATION: application layer section (all 6 functions + AnalysisRequest)
- MCP_WALKTHROUGH.md: 5 example conversations showing natural-language usage
- Natural-language framing notes on all 7 category READMEs
- Fixed AlphaGenome HF URL, clarified environment.yml vs chorus-base.yml
- Notebook install banners for comprehensive/advanced (all 6 oracles required)
Scripts:
- Internal scripts moved to scripts/internal/
- regenerate_examples.py + regenerate_remaining_examples.py for reproducible output generation
- scripts/README.md updated with public script descriptions
Testing:
- 268 tests passed (including new magnitude-gate and causal-table tests)
- All 3 notebooks executed end-to-end (Enformer, all 6 oracles, multi-oracle analysis)
- IGV browser rendering verified via Selenium in headless Chrome
- MCP server startup verified
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are replaced by the per-oracle build_backgrounds_*.py scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove stale outputs containing machine-specific paths (/Users/lp698/...) and runtime-specific logs. Notebooks should be committed clean so new users run them fresh in their own environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace /PHShome/lp698/chorus with REPO_ROOT (computed from __file__) in all 8 public scripts (6 build_backgrounds + 2 regenerate) - Clear stale notebook output cells containing machine-specific paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ll traceability Batch scoring: - Per-track columns (one per assay:cell_type) with raw score + percentile - track_scores dict preserved on BatchVariantScore for programmatic access - display_mode parameter: "by_assay" (default), "by_cell_type", "summary" - Track ID footnotes for tracing back to oracle data - oracle_name parameter fixes normalizer CDF lookup (was returning None) Causal prioritization: - Per-track columns replacing generic "Max Effect / Top Layer" - Each cell shows raw effect + percentile for each scored track - track_scores dict on CausalVariantScore Report infrastructure: - report_title field: "Region Swap Analysis Report", "Integration Simulation Report" - modification_region: IGV highlights full replaced/inserted region (not 2-3bp) - modification_description: documents what was inserted/replaced and its length - has_quantile scoping fix (UnboundLocalError on empty allele_scores) All examples regenerated with biologically specific tracks: - SORT1: HepG2 DNASE + CEBPA + CEBPB + H3K27ac + CAGE (reproduces Musunuru) - BCL11A: K562 DNASE + GATA1 + TAL1 + H3K27ac + CAGE (reproduces Bauer) - FTO: HepG2 tracks (nearest metabolic cell type available) - TERT: K562 tracks - Validation: forced HepG2 CEBP tracks matching the AlphaGenome paper Every report carries the user's original prompt (Analysis Request block). All 13 examples verified: MD + JSON + TSV + HTML, prompt present, 268 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…racle table - Remove ChromBPNet loading and "Combining oracles" sections (belong in main README) - Rename "Window" to "Output window" + add Resolution column - Separate Effect percentile and Activity percentile explanations - Add recommendation to start with AlphaGenome - Remove Python API details (get_normalizer) that don't belong here Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .mcp.json: drop the /data/pinello/... PATH hardcoding so new users can use the file as-is from `curl` in any environment. mamba resolves the chorus env without an explicit PATH override. - README.md: add LDlink token setup section under Troubleshooting — fine_map_causal_variant auto-fetch path was silently failing for users without a free LDlink API key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes informed by a fresh walkthrough of the README from the perspective of a brand-new user: - Reorder Installation: Fresh Install now comes before Upgrading (a first- time reader no longer sees "remove existing envs" before they install) - Consolidate the Fresh Install block to cover env create, pip install, chorus setup --oracle enformer, and chorus genome download hg38 so a user copying the block ends up actually ready to run the Quick Start - Clarify that the root environment.yml is what you install and the per-oracle YAMLs in environments/ are internal to `chorus setup` - Quick Start: point to examples/single_oracle_quickstart.ipynb for users who prefer a notebook, and call out the setup prerequisite explicitly - Annotate the ENCFF413AHU track ID in the DNase snippet so users know what it is before the Discovering Tracks section explains it - HF_TOKEN: note that Claude Code inherits env from the shell where `claude` is started (the MCP server is spawned by that shell) - Add a "Further reading" section linking the docs/ folder — previously API_DOCUMENTATION, METHOD_REFERENCE, VISUALIZATION_GUIDE, and IMPLEMENTATION_GUIDE were all invisible to a README-only reader - Remove REQUIREMENTS_CHECKLIST.md (internal audit scratch file) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gitignore
Real issues caught by a deeper audit pass and fixed:
- **Duplicate CAGE column headers**: batch scoring tables rendered two
"CAGE:HepG2" columns because both + and - strand tracks have
identical description fields. _track_display_name now appends (+) / (-)
when the assay_id carries a strand suffix, producing unique column
labels in markdown, HTML, and DataFrame outputs.
- **UnboundLocalError in _build_html_report**: has_quantile /
has_baseline were defined inside a for-loop that doesn't execute for
empty allele_scores, causing .to_html() to crash on minimally-populated
reports. Initialise both before the loop (matches the markdown fix).
- **docs/RELEASE_CHECKLIST.md**: internal QA checklist with stale
metrics (references 128 tests when we have 280). Removed — internal
docs shouldn't live in user-facing docs/.
- **API_DOCUMENTATION / METHOD_REFERENCE overlap**: added reciprocal
callouts clarifying that API_DOCUMENTATION is authoritative and
METHOD_REFERENCE is a one-line cheat sheet.
- **logs/ not ignored**: 82 MB of run logs at risk of being committed.
Added to .gitignore.
Test coverage added (+12 tests, 268 → 280 passed):
- TestReportMetadataFields: report_title, modification_region,
modification_description rendering in MD / HTML / dict
- TestBatchDisplayModes: by_assay, by_cell_type, track-ID footnote,
CAGE strand disambiguation in DataFrame columns
- TestSafeToolDecorator: passthrough, exception → error dict,
function name preservation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Columns and TSV headers now show CAGE:HepG2 (+) and CAGE:HepG2 (-) instead of two identical CAGE:HepG2 columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A new user could miss this entirely — the previous mention was a single
feature bullet ("auto-downloaded from HuggingFace") with no detail.
This section spells out:
- The backgrounds turn raw log2FC into the effect/activity percentiles
shown in every report
- They're fetched on first oracle use from the public HF dataset
lucapinello/chorus-backgrounds and cached in ~/.chorus/backgrounds/
- File sizes per oracle (so users with limited disk know what to expect)
- **No HF_TOKEN required** for backgrounds (only AlphaGenome model is gated)
- LDlink token is separate and only needed for causal auto-fetch
- Optional pre-download snippet for users who want to avoid the first-use
wait
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a comprehensive reference appendix covering:
- What the backgrounds are (effect %ile vs activity %ile vs per-bin)
and why they exist (turn raw log2FC into genome-aware metrics)
- How they were calculated:
* Variant effect distribution: 10K random SNPs × all tracks with
layer-specific scoring formulas (log2FC, logFC, diff)
* Activity distribution: ~31.5K positions (random intergenic +
ENCODE SCREEN cCREs + protein-coding TSSs + gene-body midpoints)
* Per-bin distribution: 32 random bins per position for IGV scaling
* RNA-seq exon-precise sampling rule
* CAGE summary routing rule
- Sample sizes per oracle (track count, samples per track, NPZ size)
- Python API usage with verified signatures (get_pertrack_normalizer,
download_pertrack_backgrounds, effect_percentile, activity_percentile,
perbin_floor_rescale_batch)
- MCP / Claude usage (auto-attached, zero-config)
- Documented ranges and a sanity-check rule of thumb for interpretation
- How to reproduce or extend the backgrounds via build_backgrounds_*.py
All function signatures in the appendix were verified against the actual
implementation before committing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local-only IDE/agent state (settings, scheduled tasks lock) — per- developer, not for the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pies - AUDIT_PROMPT.md: systematic end-to-end audit script for a new machine, with REPLACE_* placeholders for HF_TOKEN and LDLINK_TOKEN. - .gitignore: block any *_WITH_TOKENS.md or AUDIT_PROMPT_WITH_TOKENS* file from ever being staged, since filled-in copies contain secrets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Records what worked and what did not on a fresh macOS 15.7.4 / arm64
clone of chorus-applications: full install, all 6 oracle smoke-predicts,
286/286 pytest pass, 3 example notebooks (0 errors), 22 MCP tools registered,
6/6 application tools producing correct outputs (rs12740374 SORT1 case
reproduces the published Musunuru-2010 finding), 18/19 application HTML
reports IGV-verified via headless Chrome, ChromBPNet smoke build completed
end-to-end on CPU.
Top issues a macOS user hits, ranked, with one-or-two-line fixes:
1. No Apple GPU (MPS / Metal / jax-metal) auto-detect — frameworks are
installed but borzoi/sei/legnet only check torch.cuda.is_available(),
SEI hardcodes map_location='cpu', chrombpnet/enformer envs lack
tensorflow-metal. Verified Borzoi runs on MPS in 4.3 s when forced.
2. SEI Zenodo download via stdlib urllib at ~80 KB/s — 3.2 GB tar takes
~11 h. curl -C - -L recovers it in ~30 min.
3. fine_map_causal_variant rsID-only crash (KeyError 'chrom' at
causal.py:355). Workaround: pass "chr1:pos REF>ALT" form.
4. Two-mamba-installs MAMBA_ROOT_PREFIX gotcha breaks chorus health.
5. Notebooks need explicit `python -m ipykernel install --user --name chorus`.
6. SEI download has no single-flight lock — concurrent inits race.
Verdict: production-ready with caveats. None of the issues block correctness;
all are operational or one-line code fixes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses every actionable item in audits/2026-04-14_macos_arm64.md.
All changes are platform-conditional — Linux CUDA paths are unchanged.
PyTorch oracles (borzoi, sei, legnet) — auto-detect MPS on Apple Silicon
- Both the in-process loader (chorus/oracles/{borzoi,sei,legnet}.py) and
the subprocess templates ({borzoi,sei,legnet}_source/templates/{load,
predict}_template.py) now resolve `device is None` (or the new 'auto'
sentinel) as: cuda > mps > cpu. Linux + CUDA box hits the cuda branch
first, no behavior change there.
- SEI: replaced the hard `map_location='cpu'` device pin (the value is
still used to load weights to host memory before .to(device), which is
the standard pattern across torch versions and works for mps too).
- Sei BSplineTransformation lazily moved its spline matrix only when
`input.is_cuda`. Generalized to any non-CPU device so the matmul works
on MPS as well. Verified: 286/286 pytest still pass.
TensorFlow oracles (chrombpnet, enformer) — Metal backend on Apple Silicon
- chorus/core/platform.py macos_arm64 adapter now adds
`tensorflow-metal>=1.1.0` to pip_add. Once installed, Apple's plugin
registers a 'GPU' physical device, so the oracles' existing
tf.config.list_physical_devices('GPU') auto-detect picks it up with no
code change. Linux paths don't see the macos_arm64 adapter so CUDA stays
intact.
JAX oracle (alphagenome) — unchanged
- Already explicitly skips Metal in auto-detect (jax-metal still missing
`default_memory_space` for AlphaGenome). README updated to document
this trade-off.
MCP fix — fine_map_causal_variant rsID-only crash
- Calling `fine_map_causal_variant(lead_variant="rs12740374")` previously
raised KeyError: 'chrom' at chorus/analysis/causal.py:355 because
`_parse_lead_variant("rs12740374")` returns {"id": ...} only.
- Backfill chrom/pos/ref/alt onto the sentinel from the LDlink response
(which always carries them) before invoking prioritize_causal_variants.
- Verified end-to-end: rs12740374 ranked #1 with composite=1.000 of 12 LD
variants on AlphaGenome (matches the published Musunuru-2010 finding).
SEI Zenodo download — chunked + resume + single-flight lock
- Replaced urllib.request.urlretrieve with a stdlib chunked urlopen loop
that supports HTTP Range resume and an fcntl exclusive lock so two
concurrent SeiOracle inits don't race the same partial file. Original
observed throughput on macOS was ~80 KB/s (would take ~11 hours for the
3.2 GB tar); the new path resumes interrupted downloads and progress-
logs every 100 MB.
README — macOS troubleshooting + Apple GPU policy table + kernel install
- Documented the two-mamba-installs MAMBA_ROOT_PREFIX gotcha that breaks
`chorus health` when the new chorus env lands in a different mamba root
than the per-oracle envs.
- Added the per-oracle macOS GPU support matrix (MPS / Metal / CPU) with
explicit `device=` examples.
- Added the missing `python -m ipykernel install --user --name chorus`
step to Fresh Install so examples/*.ipynb find the chorus kernel.
Validation on macOS 15.7.4 / Apple Silicon (CPU + MPS + Metal):
- 286/286 pytest pass (incl. all 6 oracle smoke-predict tests)
- chorus.create_oracle('borzoi') auto-detects mps:0
- chorus.create_oracle('sei') auto-detects mps:0 + smoke-predict ok
- chrombpnet env now reports tf.config.list_physical_devices('GPU') = [GPU:0]
- fine_map_causal_variant(lead_variant='rs12740374') ranks rs12740374
composite=1.000 of 12 LD variants
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…EI resumable download, rsID backfill) Verified on Linux CUDA: 285/285 code tests pass. Alphagenome smoke errors due to expired HF token in chorus-alphagenome env (unrelated to PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tests/test_mcp.py: TestFineMapRsidBackfill verifies fine_map_causal_variant backfills chrom/pos/ref/alt when a caller passes only an rsID lead_variant. Regression test for the macOS audit crash (KeyError: 'chrom'). - examples/*.ipynb: Re-executed all three notebooks end-to-end on Linux CUDA to refresh outputs against the merged audit branch. Full suite now: 286/286 tests pass (including alphagenome real-oracle smoke). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regeneration on Linux CUDA GPU 1 (GPU 0 was full): - AlphaGenome variant + validation (5 examples) — 28 min - Remaining AlphaGenome (batch/causal/discovery/seq) — ~14 min - Enformer SORT1 — 1 example - ChromBPNet SORT1 — 1 example Discovery HTML filenames now use oracle_name "alphagenome" (was placeholder "oracle"). Verification: - 289/289 tests pass across combined runs (6 oracle smoke tests green on GPU 1) - Selenium screenshot sweep: 18/19 HTML render cleanly; 1 "NO-IGV" is the batch_scoring HTML which is a scoring table by design (no browser view) - Hardcoded /PHShome/lp698/chorus paths in notebook log outputs redacted to /path/to/chorus .gitignore: ignore examples/applications/**/*_screenshot.png so selenium artifacts don't pollute the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lper Adds audits/2026-04-15_macos_arm64_post_merge.md — the second-pass end-to-end audit on a fully wiped + fresh-cloned install, after the PR #7 macOS-support changes were merged. Every fix from v1 is confirmed working on a clean setup: * chrombpnet + enformer envs now pull in tensorflow-metal automatically → `Auto-detected 1 GPU(s) … name: METAL` * borzoi/sei/legnet auto-detect mps:0 * fine_map_causal_variant("rs12740374") rsID-only returns rs12740374 composite=0.963 of 12 LD variants (was KeyError in v1) * analyze_variant_multilayer reproduces Musunuru-2010 biology (CEBPA strong binding gain +0.37, DNASE strong opening +0.43) * 286/286 pytest, 0 notebook errors, 19/19 IGV reports ok Two download-reliability findings surfaced on this clean run (both pre-existing, both same bug-class as the SEI fix that landed in PR #7): chorus/utils/genome.py stalled at 36% of the hg38 download, and chorus/oracles/chrombpnet.py has no single-flight lock so two concurrent callers race the ENCODE tar and hit EOFError. This commit also adds chorus/utils/http.py — the resume+lock helper that previously lived inside SeiOracle, now extracted as a shared stdlib-only utility so genome + chrombpnet can reuse it. The sei.py helper shim keeps the old public API working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elper
Three call sites fetch large files from the public internet with plain
urllib.request.urlretrieve (no resume, no concurrency lock). The
2026-04-15 v2 audit on a fresh install hit two of them the hard way:
UCSC cut the hg38 connection at ~36% of the 938 MB download
(urllib.error.URLError: retrieval incomplete: got only 363743871 out
of 983659424 bytes), and two concurrent callers of
_download_chrombpnet_model raced the same partial ENCODE .tar.gz so
one read it mid-write and hit
EOFError: Compressed file ended before the end-of-stream marker was
reached
inside tarfile.extractall.
Re-use the resume+lock helper introduced for SEI in PR #7, lifted
into chorus/utils/http.py in the preceding commit:
chorus/oracles/sei.py
_download_with_resume staticmethod becomes a thin shim that
forwards to chorus.utils.http.download_with_resume. No behaviour
change and no API break.
chorus/utils/genome.py
GenomeManager.download_genome swaps urllib.request.urlretrieve
for download_with_resume. Fixes the UCSC stall observed in the
v2 audit; partial .fa.gz is now resumable across retries.
chorus/oracles/chrombpnet.py
_download_chrombpnet_model (ENCODE tar) and _download_jaspar_motif
(JASPAR motif) both route through download_with_resume. The fcntl
lock on <dest>.lock serialises concurrent callers so the pytest
smoke fixture and a background build_backgrounds_chrombpnet.py
job can no longer corrupt each other's download.
All three changes are platform-agnostic; the helper is stdlib-only
(urllib + fcntl). Linux CUDA is not touched.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…S audit v2 audit confirmed post-merge macOS works end-to-end (286/286 tests, 19/19 HTML, reproduces Musunuru-2010 biology, rsID backfill verified). Adds chorus/utils/http.py (resume + fcntl-lock) and routes hg38 genome + chrombpnet ENCODE tar + JASPAR motif downloads through it. SEI helper becomes a shim for backward-compat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Third-pass audit going one level deeper than v1 (pre-merge smoke) and
v2 (post-merge fresh install). Scope: 14 application examples × 19 HTML
reports × 6 per-track normalizer NPZs + the scoring/normalization stack.
Read-only deliverable — the fixes identified here belong in a separate
follow-up PR after review.
Findings (5, ranked by severity):
1. HIGH — chrombpnet_pertrack.npz:DNASE:hindbrain has 0 background
samples and an all-zeros CDF. PerTrackNormalizer.effect_percentile()
silently returns 1.0 for every raw_score (including 0.0) because
np.searchsorted on a zeros row ranks everything at the end, and
_get_denominator falls through to cdf_width=10000 when counts[idx]=0.
Same bug class as the v2 concurrent-download race that landed in
PR #8 — the hindbrain model download failed silently and left a
zero-count reservoir. Impact: any variant scored against
DNASE:hindbrain in ChromBPNet gets a false "100th percentile".
2. MEDIUM — every committed HTML report loads igv.min.js from
cdn.jsdelivr.net at view time. 2/19 reports flaked on
net::ERR_CERT_AUTHORITY_INVALID during this audit; any user
behind a corporate proxy / airgapped network / jsdelivr outage
will see IGV silently fail with no fallback. No SRI either.
3-5. LOW — documentation improvements:
- TERT_promoter example doesn't caveat that C228T's published
biology is melanoma-specific; K562 result (all negative) is
correctly modelled but reads as "no effect" without context
- AlphaGenome DNASE vs ChromBPNet ATAC disagree on rs12740374
direction in HepG2 (+0.45 vs -0.11); no application note
teaches this real cross-oracle divergence
- HBG2_HPFH footer notes BCL11A/ZBTB7A catalog absence; could
be tightened
Normalization stack verified clean:
- CDF monotonicity: 0 bad rows across 18,159 tracks × 10,000 points
- signed_flags match LAYER_CONFIGS.signed exactly (AG 667 RNA-seq,
Borzoi 1543 stranded RNA, SEI 40/40 regulatory_classification,
LegNet 3/3 MPRA; Enformer 0 signed is correct — no RNA-seq)
- Build-vs-scoring window_bp bit-identical via shared LAYER_CONFIGS
- Pseudocount/formula: _compute_effect reproduces reference
implementation with diff=0.0 across all test cases
- perbin_floor_rescale_batch math verified at all edges
- Edge cases: unknown oracle → None, unknown track → None,
raw=0 → 0.0, raw=huge → 1.0
Phase A rerun on 4 AlphaGenome literature-checked cases (SORT1, TERT,
FTO, BCL11A) confirms biology is preserved but results are NOT bit-
identical — raw_score drift ~1-2% on dominant tracks, larger quantile
swings on near-zero tracks due to AlphaGenome's JAX CPU non-
determinism. No committed example is stale. Noise-floor handling for
|raw_score| < ~1e-3 added to follow-up recommendation list.
Artifacts:
- audits/2026-04-16_application_and_normalization_audit.md (main report)
- audits/2026-04-16_screenshots/*.png (19 full-page PNGs)
- audits/2026-04-16_data/*.json (per-app cards + normalization/selenium/rerun JSON)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ixes Addresses the findings in audits/2026-04-16_application_and_normalization_audit.md (PR #9). Three categories of change: 1. Delete two example applications the audit recommends removing: - examples/applications/variant_analysis/TERT_promoter/ C228T is a melanoma-specific gain-of-function mutation; the example runs it in K562 (erythroleukemia) and shows all-negative effects. The biology is correct for the model but inverts the published direction. Rather than add a "wrong cell type" caveat, drop the example — SORT1 / FTO / BCL11A cover variant_analysis without teaching the reader a misleading result. - examples/applications/validation/HBG2_HPFH/ Already self-documented as "Not reproduced" in validation/README.md: BCL11A / ZBTB7A aren't in AlphaGenome's track catalog, so the repressor-loss mechanism isn't visible. Keeping a "validation failed" example alongside the working SORT1_rs12740374_with_CEBP confuses readers. Drop it. Also updated: root README.md (replaces HBG2_HPFH link with SORT1_rs12740374_with_CEBP), examples/applications/variant_analysis/README.md (drops TERT prompt + section), examples/applications/validation/README.md (drops HBG2 row + section + reproduce snippet), scripts/regenerate_examples.py + scripts/internal/inject_analysis_request.py (both lose their TERT_promoter/HBG2_HPFH entries). 2. Normalizer: guard against zero-count CDF rows (chorus/analysis/normalization.py). Audit finding #1 (HIGH): the committed chrombpnet_pertrack.npz has DNASE:hindbrain with effect_counts[idx] == 0 and a zero-filled CDF row. effect_percentile() / activity_percentile() silently returned 1.0 for every raw_score (including 0.0) because np.searchsorted on a zeros row returns len(row) for any non-negative probe and the denominator falls through to cdf_width. Same bug-class as the v2 chrombpnet concurrent-download race that landed in PR #8 — the hindbrain ENCODE tar must have failed to extract cleanly during the original background build. New private helper _has_samples() returns False when counts[idx] == 0, which makes _lookup / _lookup_batch return None. Callers already render None as "—" in MD/HTML tables, so users now see "no background" instead of a silent false "100th percentile". Counts-less NPZs (older format, no counts field) are treated as valid — no regression. 3. Report: suppress quantile_score when raw_score is in the noise floor (chorus/analysis/variant_report.py). Audit finding #6 (LOW): when |raw_score| < 1e-3 the effect CDF is so densely clustered around 0 that a 1-2% raw-score drift can swing the quantile by 0.5+ (observed in the Phase A rerun: committed quantile=1.0 vs rerun=0.21 for a CEBPB track with raw_score ~1e-4). Set quantile_score = None in that regime so the HTML/MD tables render "—" and readers don't misread noise as signal. Threshold chosen conservatively to cover both log2fc (pc=1.0) and logfc RNA (pc=0.001) without hiding real effects. 4. IGV.js: lazy-download the bundle into ~/.chorus/lib on first use (chorus/analysis/_igv_report.py + chorus/analysis/causal.py). Audit finding #2 (MEDIUM): reports embed a <script src="..."> to cdn.jsdelivr.net that gets evaluated every time the HTML is opened in a browser. Any viewer on an airgapped network / corporate proxy that MITMs TLS / during a jsdelivr outage sees IGV silently fail (2/19 audit reports hit ERR_CERT_AUTHORITY_INVALID). The local- cache code path already existed but was opt-in (user had to drop a file in ~/.chorus/lib/igv.min.js manually). New _ensure_igv_local() helper runs on the first report generation and populates the cache via chorus.utils.http.download_with_resume (the helper that landed in v2 PR #8). Reports written after the first successful download inline the JS directly — self-contained HTML that opens anywhere without network. Download failure is logged at WARNING and the CDN <script> tag is used as fallback, preserving the current behaviour for anyone who can't reach jsdelivr at generation time. All changes are platform-agnostic; 287/287 pytest continue to pass; fix verified behaviourally: >>> norm.effect_percentile('chrombpnet', 'DNASE:hindbrain', 0.0) None # was: 1.0 >>> norm.effect_percentile('chrombpnet', 'DNASE:HepG2', 0.0) 0.0 # unchanged >>> ts = TrackScore(raw_score=0.0005, ...); >>> _apply_normalization(ts, ...); ts.quantile_score None # noise floor See audits/2026-04-16_application_and_normalization_audit.md (PR #9) for full context, per-app screenshots, and the Phase A / B / C methodology behind each finding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Screenshot review found causal prioritization MD+HTML still used the old '(100%)' / '(95%)' percentile format. The variant_report tables and batch_scoring tables already used '≥99th' / '0.95' / '≤1st' via _fmt_percentile (added in the audit pass). Causal was the last remaining site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two scary-looking warnings surfaced while reading notebook cell outputs in the v7 audit. Neither is a real problem but both alarm users: 1. chorus/core/base.py:323 — case-sensitive compare of reference allele vs genome. pyfaidx returns lowercase for softmasked (repetitive) regions; users always pass uppercase. The previous code fired 'Provided reference allele is not the same as the genome reference' on every variant in a softmasked locus (e.g. GATA1 TSS in quickstart notebook cell 39, comprehensive notebook cells 35 and 51). Now uses .upper() on both sides; also includes the actual allele pair in the warning message so users can confirm. 2. chorus/core/result.py:104 — 'Unknown implementation' warning fired for every Sei track (Stem cell / Multi-tissue / H3K4me3 etc.) that isn't in the hardcoded assay_type registry. The generic fallback works correctly; the warning was just noise. Downgraded to logger.debug. Scientific review of outputs: - SORT1 rs12740374: predictions match Musunuru 2010 mechanism (CEBPA/B binding gain, DNASE opening, H3K27ac gain, CAGE TSS increase) ✓ - BCL11A rs1427407: TAL1 binding loss + DNASE closing in K562 ✓ - FTO rs1421085: minimal effects in HepG2 (expected — adipose tissue) ✓ - TERT chr5:1295046 T>G: E2F1 binding gain + TERT TSS CAGE increase ✓ - SORT1 causal: rs12740374 ranks #1 composite=0.964 ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ght CI v8 found zero action items but called out four scenarios it did not exercise. This PR closes them. Fast suite (299 → 303): tests/test_error_recovery.py - HF download ConnectionError → graceful return + warning log - download_with_resume .partial file resume via Range header - AlphaGenome missing HF_TOKEN → friendly actionable error - Missing oracle env → "chorus setup" hint + graceful fallback All four mocks, run in ~1.5 s, no network. Integration suite (gated by @pytest.mark.integration): tests/test_integration.py - SEI + LegNet CDF download from HF dataset (v8 didn't trigger these because no regen workflow uses sei/legnet) — verified the NPZs load and pass monotonicity + p50/p95/p99 + counts checks - ChromBPNet ATAC:K562 fresh download from ENCODE (~500 MB, 8 min) to a tmp dir — verifies the shared download_with_resume helper end-to-end without touching the 37 GB real cache - First E2E test of the MCP server: spawn chorus-mcp stdio subprocess via fastmcp Client, call list_oracles + load_oracle + analyze_variant_multilayer on SORT1 rs12740374 with real AlphaGenome predict (~4.5 min) Light CI: .github/workflows/tests.yml - Runs fast suite on every PR and push to main/chorus-applications - Linux ubuntu-latest + Miniforge + mamba + pip install -e . - Skips smoke tests (~10 GB models exceed runner disk) and integration tests (too slow for per-PR feedback) - workflow_dispatch for manual maintainer runs pytest.ini: registers the `integration` marker. Verified: - pytest -m "not integration" → 303 passed, 4 deselected (58.9 s) - pytest -m integration → 4 passed (13 min total) - Fresh ATAC:K562 tarball streamed with .partial + fcntl lock, fold 0 weights loaded into TF, 2114 bp predict returns finite values - chorus-mcp subprocess round-trips analyze_variant_multilayer response back to the client Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unts Full sweep of user-facing docs after the multi-pass audit series. 11 fixes: BLOCKS_USER (wrong biology in example READMEs): - variant_analysis/SORT1_chrombpnet/README.md: was claiming "+0.441 Strong opening", actual is "-0.111 Moderate closing". Replaced with correct values + cross-oracle divergence explanation. - sequence_engineering/region_swap/README.md: entire scenario was wrong (promoter swap at chr1:1000500 vs actual SORT1 enhancer replacement at chr1:109274500). Rewrote from scratch to match actual example_output.md. - sequence_engineering/integration_simulation/README.md: wrong direction (DNASE -0.900 vs actual +4.22), wrong filename. Rewrote. - variant_analysis/SORT1_enformer/README.md: stale HepG2-focused numbers but actual is discovery-mode. Rewrote with top hits from current output. CONFUSING (stale numbers): - causal_prioritization/README.md:116: composite 0.898 → 0.964 - batch_scoring/README.md example table: old 4-column format → new per-track Ref/Alt/log2FC/Effect %ile format - causal SORT1_locus/example_output.md: (100%) → (≥99th) via in-place patch (percentile format now consistent with other reports) - validation/README.md:33,58: stale TERT CAGE +0.120 → +0.34 Track count drift (pick-one): - 5,930 / 5930 → 5,731 across docs/API_DOCUMENTATION.md, docs/variant_analysis_framework.md, README.md, examples/applications/ - 230 bp → 200 bp for LegNet input size across same files POLISH: - docs/IMPLEMENTATION_GUIDE.md tree: removed "# Placeholder" markers on borzoi/chrombpnet/sei (all implemented); added analysis/ and mcp/ directories; updated utils/ to reflect current files. - multilayer_variant_analysis.md: "Quantile range" header → "Effect percentile range" to match the rest of the repo's terminology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full fresh-install audit at fbaef50 after wiping 13.2 GB (7 mamba envs, ~/.chorus/, HF chorus models). This pass focused on reading the actual content of every example output, not just error counts. Four findings, all low-to-medium environmental: 1. MEDIUM: TF Hub's /var/folders/.../tfhub_modules/ cache survives chorus teardowns. A stale partial download from a prior session made Enformer smoke fail with "'saved_model.pb' nor 'saved_model.pbtxt'". Clearing it fixes; README should document. 2. MEDIUM (regression from v8): On this audit's SSL-MITM'd network, stdlib urllib in download_with_resume fails on cdn.jsdelivr.net/igv.min.js with cert-verify error. 6/16 HTMLs landed on the CDN <script> fallback. huggingface_hub's httpx works through the same proxy — robust fix is to mirror igv.min.js on the HF chorus-backgrounds dataset and fall back there when urllib fails. 3. LOW: FTO README promises adipose tracks but example runs with HepG2 (documented in the prompt only, not the README). 4. LOW: Notebooks run via `jupyter nbconvert` without `mamba activate chorus` emit 20-60 `bgzip is not installed` ERROR lines per notebook from coolbox. bgzip IS in the env — PATH just isn't inherited. Plots render via in-memory fallback; user sees scary error spam. Verified: - 303/303 fast pytest (17.5 s) - 6/6 oracle smoke (after tfhub clear) - 12/12 examples regenerate, max Δeff 0.036 (CPU non-det) - 0 orphan HTMLs after parallel regen (v6 API fix live) - 3 notebooks execute, 0 errors, plots render (despite bgzip noise) - 16/16 HTMLs show correct biology in spot-check vs literature - 6/6 CDFs pass monotonicity/p50/p95/p99/counts (first audit to empirically verify sei + legnet CDFs; v9 integration test now automates) Biology confirmed on: SORT1 rs12740374 (Musunuru 2010 CEBP mechanism), BCL11A rs1427407 (TAL1 disruption in K562), TERT chr5:1295046 (E2F1 + CAGE activation), region_swap (enhancer removal closes chromatin), integration_simulation (CMV insertion opens chromatin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ip PATH All 4 findings from audit PR #20. Fast suite 303 → 308. 1. MEDIUM: TF Hub corrupt-cache recovery (enformer.py + load_template.py) When an earlier Enformer download was interrupted, tfhub_modules/ keeps a directory with no saved_model.pb. hub.load then raises with the bad path in the message. Added _load_enformer_with_tfhub_recovery that parses the error, wipes the bad dir, and retries once. Applied to both in-process (_load_direct) and subprocess (template) paths. README Troubleshooting now documents the manual workaround and the auto-recovery behavior. 2. MEDIUM (v8 regression): IGV JS fallback via huggingface_hub (_igv_report.py). On SSL-MITM networks stdlib urllib rejects the proxy cert and CDN fetch of igv.min.js fails. Added secondary fallback via hf_hub_download from lucapinello/chorus-backgrounds — huggingface_hub uses httpx+certifi which works through the same proxies that block urllib. Graceful no-op if the HF file doesn't exist yet (existing CDN <script> fallback still kicks in). Dataset upload of igv.min.js to the HF repo is a separate one-time task for the maintainer; until then this code path silently downgrades to the current behavior. 3. LOW: FTO README adipose claim vs HepG2 reality (examples/applications/variant_analysis/FTO_rs1421085/README.md). Rewrote the Tracks section to accurately state that the committed example uses HepG2 as a "nearest metabolic" proxy and shows what a no-signal call looks like. Included ready-to-use adipose-track assay_ids for users who want the biologically ideal run. 4. LOW: bgzip/tabix PATH when nbconvert skips mamba activate (chorus/__init__.py). Prepend sys.executable's bin/ to PATH at chorus import time. coolbox's subprocess calls to bgzip/tabix now succeed instead of emitting 20-60 ERROR lines per notebook and falling back to TabFileReaderInMemory. Cheap and idempotent (only prepended if not already present). Tests: 5 new in tests/test_error_recovery.py - test_corrupt_cache_is_cleared_and_retry_succeeds - test_unrelated_errors_propagate_unchanged - test_hf_fallback_when_cdn_fails - test_returns_none_when_both_fail - test_env_bin_on_path_after_import Verified: pytest -m "not integration" → 308 passed (was 303). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v8 audit added an fcntl lock around the OUTER ENCODE tarball
extraction. The v9 scorched-earth audit revealed that two concurrent
callers (MCP load_oracle + a jupyter notebook kernel both asking for
ChromBPNet) still raced on the INNER loop that extracts the three
per-fold subtarballs (bias_scaled / chrombpnet / chrombpnet_nobias),
producing:
FileExistsError: [Errno 17] File exists:
'.../downloads/chrombpnet/DNASE_HepG2/models/fold_0/chrombpnet_nobias'
Fix: re-acquire the lock around the inner loop and skip any t_out
directory that's already populated (empty-check via os.listdir).
Also fix the bare except to preserve OSError → Exception.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fresh-install audit at e99fd66 verifying all 4 v10 fixes on a truly clean slate. Teardown: 14.2 GB including tfhub_modules/ this time. All 4 v10 fixes verified live: - Fix #1 (tfhub recovery): code path exists + first-install smoke passes on wiped tfhub cache. - Fix #2 (IGV HF fallback): 0/16 HTMLs fell back to CDN on the same SSL-MITM network that had 6/16 fallbacks in v10. - Fix #3 (FTO README): accurate HepG2 framing + adipose assay_ids block for the ideal run. - Fix #4 (bgzip PATH): 0 'bgzip is not installed' lines across 235 notebook cells (v10 had 20/34/60 per notebook). One minor regression exposed: Fix #4 makes tabix findable, which reveals a pre-existing bug where download_gencode leaves a stale .tbi file that coolbox's `tabix -p gff` rejects with "index file exists". Workaround = delete .tbi; NB1 retry succeeded. Proposed 3-line follow-up fix to annotations.py documented in the report. Also verified: - 308/308 pytest on fresh env (17.3 s) - 6/6 oracle smoke (7 min 2 s) — first Enformer fresh-install with wiped tfhub cache - 12/12 regen within AlphaGenome CPU non-determinism tolerance - 0 orphan HTMLs after parallel regen - 3 notebooks: 0 errors, 0 warnings, 0 bgzip spam - 16/16 HTMLs clean in Selenium - FTO README spot-check confirms Fix #3 committed correctly After 11 audit passes — the last two have surfaced no actual chorus bugs, only environmental quirks (tfhub cache, SSL MITM, PATH inheritance, stale .tbi). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efreshes GTF v11 audit exposed a pre-existing bug masked by the pre-Fix-#4 state where tabix was not on PATH: when download_annotation refreshes a GTF, leftover coolbox artefacts (file.gtf.bgz + file.gtf.bgz.tbi from a previous session) point at byte offsets in the old .bgz that no longer match the new one. coolbox then calls tabix -p gff file.bgz on its next GTF() read, tabix refuses to overwrite without -f, and the notebook cell crashes with: CalledProcessError: Command '['tabix', '-p', 'gff', ...]' returned non-zero exit status 1. Fix: in AnnotationManager.download_annotation, after sort_annotation writes the fresh GTF, unlink any stale .bgz / .bgz.tbi / .gz.tbi sharing the same stem. coolbox then regenerates them cleanly on first GTF() call. Three extra unlink() calls on a not-hot path. Unit test: TestStaleGTFIndexCleanup in tests/test_error_recovery.py mocks requests.get + sort_annotation, primes the annotations dir with stale .bgz and .tbi, and verifies both are removed after download_annotation returns. Verified: pytest -m "not integration" → 309 passed (was 308). Without this fix, a notebook run after any annotation refresh (download_gencode() called twice across sessions, or a newer GENCODE version pulled) hits the tabix error on first coolbox visualization cell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tall Full cold-start on /srv/local/lp698/chorus-audit-v9: - 7 chorus envs wiped + rebuilt via chorus setup - ~/.chorus + ~/.cache/huggingface wiped + re-downloaded - 314/314 tests pass (308 code + 6 oracle smoke) - 3 notebooks: 0 errors across 235 cells, 32 plots - All 13 examples regenerated fresh - Selenium: 16/16 HTML reports CLEAN Includes 2026-04-17_v9_scorched_earth_audit.md report with scientific content review — every prediction cross-checked against published literature (Musunuru 2010 for SORT1, Bauer 2013 for BCL11A, Claussnitzer 2015 for FTO, etc.). All match textbook biology. One bug found + fixed during audit: ChromBPNet nested-tar race (commit 7834d3c, merged earlier). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ance, multi-oracle Every report HTML (variant, causal, discovery, region-swap, integration, batch, validation) now opens with a shared "How to read this report" glossary that names each layer's effect formula (log2FC / lnFC / Δ). Per-layer table headings carry a formula chip; summary strings cite the specific track and cell type that drove each headline number; the causal SORT1 report merges its old Rankings + Details sections into one expandable table with per-layer top-track provenance. New MultiOracleReport renders a cross-oracle consensus matrix for one variant scored by several oracles (example: SORT1 rs12740374 scored with ChromBPNet + LegNet + AlphaGenome), flagging where models agree or disagree on direction per layer. Shared logic lives in chorus/analysis/_report_glossary.py so renderers stay consistent; VariantReport.from_dict enables JSON rehydration used by scripts/rerender_examples.py to refresh HTML without re-running oracles. 14 new tests pin the new behaviour; all 333 suite tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sensus
The consensus matrix previously rendered a single oracle's direction as
"all ↑" / "all ↓" — technically correct (all reporting oracles agree
trivially when there's one) but misleading to users who see "all ↑"
and assume multiple oracles concurred. v11 content-review audit
caught this on the committed SORT1 multi-oracle example, where
chrombpnet, legnet, and alphagenome are specialists — each reports a
different subset of layers — and the three "all ↑" rows in the matrix
were actually single-oracle votes.
Change:
- _consensus_rows now emits "single_gain" / "single_loss" when
exactly one oracle reports a direction, separate from
"consensus_gain" / "consensus_loss" which now require ≥2 oracles.
- Markdown renders as "only ↑ (n=1)" / "only ↓ (n=1)"; HTML uses
"↑ only (n=1)" / "↓ only (n=1)" with a new neutral-grey
.agree-single CSS class so users visually distinguish trivial
single-voter layers from real cross-oracle consensus.
- Existing "all ↑" / "all ↓" labels still fire when 2+ oracles agree.
Regenerated SORT1 rs12740374 multi-oracle example (it was the direct
case that exposed the bug): the three previously-"all ↑" single-voter
layers (TF binding, histone marks, CAGE — all AlphaGenome-only) now
correctly read "only ↑ (n=1)". The "disagree" chromatin row (AG vs
ChromBPNet) and any future ≥2-oracle consensus rows are unchanged.
Tests: +1 regression test (test_single_voter_layer_uses_n1_label_not_all)
that verifies both agreement dict values ("single_gain"/"single_loss")
and the user-visible strings in MD + HTML. All existing multi-oracle
tests still pass (they all use ≥2 oracles per layer, so the
consensus_gain / consensus_loss path is untouched).
Verified: pytest -m "not integration" → 325 passed (was 324).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The causal report's IGV renderer was writing signal tracks directly from raw oracle predictions with per-track autoscale, while the variant-report IGV was already running a PerTrackNormalizer floor-subtract + rescale to [0, 3.0] (1.0 = genome-wide p99 peak). That inconsistency meant two reports from the same run had incomparable y-axes. Extract the rescale step into ``_igv_report.apply_floor_rescale`` and call it from both ``build_igv_html`` and ``_build_causal_igv`` so every IGV panel in every report uses the same scaling by default. Users who want raw dynamics can opt out via ``CausalResult._igv_raw`` (mirrors the existing ``VariantReport._igv_raw`` knob). Regenerated the SORT1 causal example; 36 signal-track panels now use scaled (min=0, max=3) instead of raw autoscale. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…consensus The multi-oracle consensus matrix now renders 'only ↑ (n=1)' for layers covered by a single oracle and reserves 'all ↑' / 'all ↓' for ≥2 agreeing oracles — avoiding the misleading 'all ↑' badge on SORT1 TF/histone/CAGE rows where only AlphaGenome contributes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…order Two small fixes to the rerender path: - When a dir has one JSON but a non-default HTML filename (e.g. the Enformer variant_analysis report used ``rs12740374_...`` rather than the ``chr1_...`` default), the rerender now matches on oracle_name so we keep the existing name instead of writing an orphan file next to it. - The multi-oracle consolidator loops oracles in the canonical order (specialists → generalist: chrombpnet, legnet, alphagenome) matching scripts/regenerate_multioracle.py, so consensus-matrix columns don't shuffle between a full regen and a pure-JSON rerender. Also synthesises a multi-oracle AnalysisRequest instead of reusing the first per-oracle one — otherwise the rendered prompt block read as a single-oracle run. Re-rendered the multi-oracle example to confirm output matches the full regen (only the timestamp differs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First audit pass to deliberately exercise all three user-facing
modalities — Python library, committed examples, MCP server over
stdio — on the SAME variant and verify outputs are bit-identical.
Teardown: 14.2 GB (envs + ~/.chorus + HF models + tfhub cache).
Key result — cross-modality consistency:
Regenerating SORT1 rs12740374 analysis via (A) the Python library
regen scripts and (B) fastmcp Client calling chorus-mcp over
stdio produces:
Track regen MCP Δ labels
DNASE +0.4315 +0.4315 0.0000 identical
CEBPA +0.3712 +0.3712 0.0000 identical
CEBPB +0.2822 +0.2822 0.0000 identical
H3K27ac +0.1660 +0.1660 0.0000 identical
Numbers match to 4 decimal places and descriptions are byte-
identical ("DNASE:HepG2", "CHIP:CEBPA:HepG2", etc.). Users moving
between Python scripts, the MCP server running under Claude, and
the committed example reports will not see any discrepancies.
Two findings, both environmental:
1. MEDIUM: Enformer regen re-creates f15d926-deleted files.
scripts/regenerate_examples.py ENFORMER_EXAMPLES still has two
entries targeting validation/SORT1_rs12740374_with_CEBP/chr1_...
that f15d926 deleted. On regen, 2 orphans appear in git status.
Fix: drop the two entries.
2. MEDIUM (recurring v10): SSL-MITM networks make the CDN fetch of
igv.min.js fail; v10 Fix #2's HF fallback can't activate because
igv.min.js is not yet uploaded to lucapinello/chorus-backgrounds.
When parallel regens start with a cold ~/.chorus/lib/ cache, the
earliest-written HTMLs get CDN <script> tags (4/18 this run).
Fix: one-time upload of igv.min.js to the HF dataset — code
already ready.
Other verified:
- 326/326 pytest on fresh env (18 s)
- 6/6 oracle smoke (7m 36s) — Enformer tfhub fresh download clean
- 12/12 examples regenerate within non-det tolerance
- 18/18 HTMLs carry the new "How to read" glossary (f15d926)
- 235 notebook cells across 3 NBs, 0 errors, 0 bgzip spam (Fix #4)
- v12 n=1 label fix live — multi-oracle single-voter rows now read
"only ↑ (n=1)" not "all ↑"
Deferred (same as v8-v11): Linux/CUDA, hosted deployment,
clinical validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…files Two v12-audit findings, both environmental / regen-script level. #1 Drop f15d926-deleted entries from ENFORMER_EXAMPLES ======================================================= scripts/regenerate_examples.py had two ENFORMER_EXAMPLES dicts that wrote to validation/SORT1_rs12740374_with_CEBP/chr1_..._enformer_*.html files that commit f15d926 deleted as redundant. After a fresh regen, git status showed 2 untracked orphan HTMLs. Removed the two dict entries; the SORT1_enformer/ directory already covers Enformer discovery for this variant. #2 Bundle igv.min.js as a package resource =========================================== Every chorus report inlines ~1.3 MB of IGV.js so the committed HTMLs are self-contained (offline-viewable, proxy-proof, air-gap-proof). Previously the file was lazy-downloaded from cdn.jsdelivr.net on first use; on SSL-MITM networks that download fails, and the v10 HF fallback can't activate because igv.min.js was never uploaded to lucapinello/chorus-backgrounds. When multiple regen scripts run in parallel with a cold ~/.chorus/lib/, the earliest HTMLs got CDN <script> tags (4/18 in the v12 audit). New resolution order in _ensure_igv_local: 1. chorus/analysis/static/igv.min.js — bundled with the package. Always present in a standard install; no I/O, no network. 2. ~/.chorus/lib/igv.min.js — legacy cache from older installs. 3. CDN via stdlib urllib (existing). 4. HuggingFace dataset via huggingface_hub (existing). Bundled file adds 1.3 MB to the package — noise next to the GB-scale oracle deps. setup.py package_data now ships analysis/static/*.js. Tests: 2 new in TestIGVBundledResource - test_bundled_igv_js_is_present_in_package — wheel includes the file - test_ensure_igv_local_returns_bundled_without_network — monkeypatches all network fallbacks to AssertionError, verifies bundled path is returned without touching them. Existing TestIGVFallbackViaHuggingFace tests updated to monkeypatch _IGV_BUNDLED to a missing path so the CDN → HF chain is still exercised (simulates a stripped install). Verified: - pytest -m "not integration" → 328 passed (was 326) - Wiped ~/.chorus/lib/ + confirmed _ensure_igv_local returns the package-bundled path instantly with no network call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…files Eliminates the cold-cache race where regens running in parallel produced HTMLs with CDN <script> tags instead of inlined IGV.js. Bundles igv.min.js as a package resource (chorus/analysis/static/igv.min.js) with resolution order bundled → legacy cache → CDN → HF. Also drops the two ENFORMER_EXAMPLES entries that wrote to files deleted in f15d926. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds audits/2026-04-20_v12_full_ux_consistency_audit.md documenting the first cross-modality audit pass (library regen vs MCP over stdio → bit-identical scores on SORT1 rs12740374) that uncovered the two v12 findings now fixed in the companion v12-polish merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s + shared percentile format
v13 docs/content consistency sweep found two reports bleeding
through raw AlphaGenome catalog assay_ids instead of the enriched
display names used everywhere else in chorus. A user who saw
"CHIP:CEBPA:HepG2" in a variant report would see
"CHIP_TF/EFO:0001187 TF ChIP-seq CEBPA genetically modified…" in
the multi-oracle consensus matrix or the causal drill-down — an
inconsistency that's exactly the kind of polish issue that makes
users lose trust.
Affected reports:
1. MultiOracleReport (multi_oracle_report.py)
- _consensus_rows now captures ``description`` alongside assay_id
- MD render prefers description over raw assay_id
- HTML render same, plus percentile via _fmt_percentile ("≥99th"
instead of "+100.0%") — matches the format used by every other
chorus report
- Per-oracle drill-down table uses description, not <code>assay_id</code>
2. CausalResult HTML (causal.py)
- "Strongest track" line shows enriched label as the primary
user-facing name, raw assay_id demoted to a secondary <code>
tag only when description differs
- Per-layer breakdown table renders description instead of raw
assay_id
- Percentile column uses _fmt_percentile ("≥99th" / "near-zero"
instead of "+100.0%")
- IGV track labels use description so the panel matches the
table rows ("CHIP:CEBPA:HepG2 ref" / " alt" instead of the
60-char raw assay_id)
Regenerated SORT1 causal + SORT1 multi-oracle committed outputs:
- multi-oracle HTML: raw CHIP_TF references 3 → 0; enriched
references 1 → 4
- causal HTML: raw references 30 → 3 (remaining 3 are inside
IGV "name" attributes that the fix now also routes through
description — they read like "CHIP:CEBPA:HepG2 (#1 rs12740374)")
- "+100.0%" percentile format → 0; "≥99th" format → 39 (causal) + 8
(multi)
Test: tests/test_analysis.py::TestMultiOracleReport::
test_uses_enriched_description_not_raw_assay_id — asserts both the
MD and HTML render "CHIP:CEBPA:HepG2" and NOT
"TF ChIP-seq CEBPA genetically modified", and that the HTML carries
"≥99th" but no "+100.0%".
Verified: pytest -m "not integration" → 329 passed (was 328).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v13 docs/content consistency sweep caught two reports bleeding through raw AlphaGenome catalog assay_ids instead of the enriched display names every other chorus report uses. A user seeing
CHIP:CEBPA:HepG2in the SORT1 variant report would seeCHIP_TF/EFO:0001187 TF ChIP-seq CEBPA genetically modified…(60-char raw assay_id) in the multi-oracle consensus matrix or the causal drill-down — exactly the polish issue that erodes trust.What changed
MultiOracleReport(chorus/analysis/multi_oracle_report.py)_consensus_rowsnow capturesdescriptionalongsideassay_iddescriptionover rawassay_iddescription, not<code>{assay_id}</code>_fmt_percentile(≥99th/near-zeroinstead of+100.0%/-100.0%) — matches every other chorus reportCausalResultHTML (chorus/analysis/causal.py)<code>tag only when differentdescriptioninstead of<code>assay_id</code>_fmt_percentiledescriptionso the IGV panel reads "CHIP:CEBPA:HepG2 ref" / "CHIP:CEBPA:HepG2 alt" instead of the 60-char raw AlphaGenome catalog idBefore / After on committed SORT1 examples
CHIP_TF/EFO:0001187mentionsCHIP:CEBPA:HepG2"+100.0%"percentile format"≥99th"format"+100.0%""≥99th"/"near-zero"Regenerated the committed SORT1 causal + SORT1 multi-oracle examples so the repo reflects the new labels.
Test
tests/test_analysis.py::TestMultiOracleReport::test_uses_enriched_description_not_raw_assay_id— asserts both MD and HTML render"CHIP:CEBPA:HepG2"and not"TF ChIP-seq CEBPA genetically modified"; also asserts the HTML carries"≥99th"but no"+100.0%".Verified
pytest tests/ --ignore=test_smoke_predict.py -m "not integration"→ 329 passed (was 328)🤖 Generated with Claude Code