audit: 2026-04-17 v10 fresh-install + content-review audit by lucapinello · Pull Request #20 · pinellolab/chorus

lucapinello · 2026-04-17T15:50:20Z

Fresh-install audit at fbaef50 with deep content review of every example output — biology direction, summary-sentence-vs-table consistency, label coherence — not just pass/fail counters.

Four findings (all low-to-medium environmental)

MEDIUM — TF Hub's /var/folders/.../tfhub_modules/ cache survives rm -rf ~/.chorus/. A stale partial download made Enformer smoke fail ('saved_model.pb' nor 'saved_model.pbtxt'). Clearing it fixes; worth documenting in README Troubleshooting.
MEDIUM regression from v8 — On SSL-MITM networks, stdlib urllib fails to fetch cdn.jsdelivr.net/igv.min.js with a cert-verify error. 6/16 HTMLs landed on the CDN <script> fallback. huggingface_hub's httpx+certifi works through the same proxy — robust fix: mirror igv.min.js on the HF dataset and fall back there.
LOW — FTO_rs1421085/README.md promises adipose tracks; the example actually uses HepG2 ("nearest metabolic" per the prompt only).
LOW — Notebooks emit 20–60 [ERROR] bgzip is not installed lines each when run via jupyter nbconvert without a preceding mamba activate. bgzip IS in the env; PATH just isn't inherited. Plots render via TabFileReaderInMemory fallback.

Verified (beyond numbers)

Biology direction on every named example matches literature: SORT1 (Musunuru 2010 CEBP gain), BCL11A (TAL1 disruption in K562), TERT (E2F1 + CAGE activation), region_swap (enhancer removal closes), integration_simulation (CMV promoter opens).
Summary sentences match their tables in all 12 MDs.
6/6 CDFs pass empirical checks (first audit with all 6 verified empirically in one pass — sei + legnet were skipped in v6/v8).
Zero orphan HTMLs after parallel regen (v6 API fix still holds).
Notebook plots render correctly in all 3 notebooks despite bgzip error spam.
303/303 pytest on fresh env (17.5 s).
6/6 oracle smoke passes (after tfhub cache clear).

One content-review observation worth flagging

The discovery example (SORT1_cell_type_screen) ranks LNCaP prostate cancer as the #1 hit (+1.91 DNASE), above any liver cell type — even though rs12740374 is a known liver eQTL. The root cause is that LNCaP has very low baseline SORT1 DNase at this locus, so the relative log2FC is inflated. A newcomer who reads the README would be surprised. Adding one paragraph explaining this teaching moment would help.

Full report: audits/2026-04-17_v10_content_review_audit.md

🤖 Generated with Claude Code

…und distributions - New chorus/analysis module: multi-layer scoring (scorers.py), variant reports, quantile normalization, batch scoring, causal prioritization with enriched HTML tables (gene, cell type, per-layer score columns, top-3 IGV signal tracks), cell type discovery, region swap, integration simulation - New chorus/analysis/build_backgrounds.py: variant effect and baseline signal background distributions for quantile normalization, with batch GPU scripts - 8 application examples with full outputs: variant analysis (SORT1, TERT, BCL11A, FTO across AlphaGenome/Enformer/ChromBPNet), causal prioritization, batch scoring, cell type discovery, sequence engineering (region swap + integration simulation) - Validation against AlphaGenome paper: SORT1 confirmed, TERT partially confirmed (ELF1 limitation documented), HBG2 not reproduced in K562 or monocytes (ISM vs log2FC methodology difference documented with side-by-side comparison) - Fix mamba PATH resolution in environment runner and manager - Add gene_name, cell_type fields to CausalVariantScore and BatchVariantScore - 500 common SNPs BED file for background computation - 91 tests covering all analysis components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The annotation module was re-parsing the full 1GB GENCODE GTF file on every call to get_genes_in_region, get_gene_tss, and get_gene_exons (~11s each). Now the GTF is loaded once per feature type (gene/transcript/exon) and cached as a DataFrame for the process lifetime. Exon lookups use a groupby index for O(1) gene-name access. Before: 11,000ms per query (full GTF scan) After: 0.03s genes, 0.04s TSS, 1.5ms exons (cached) Full analysis test suite now completes in 2 min (was timing out at 10+ min). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ores Single-process AlphaGenome script that extracts all 3,763 valid tracks (711 cell types × 6 output types) from each forward pass. Same GPU time as K562-only (~55 min total on A100) but yields comprehensive per-layer distributions across all cell types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Comprehensive background distribution builder: - 10K random SNPs from hg38 reference across all autosomes - 20K protein-coding gene TSS positions (promoter state baselines) - 5K random genomic positions (general baseline) - Parallel GPU execution: --part variants --gpu 0 / --part baselines --gpu 1 - All 3,763 AlphaGenome tracks extracted per forward pass Expected output: ~37M variant scores + ~94M baseline samples in ~18 hours. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n counting Covers all AlphaGenome output types: - Window-based: DNASE, ATAC, CHIP_TF, CHIP_HISTONE, CAGE, PROCAP, SPLICE_SITES, SPLICE_SITE_USAGE - Exon-counting: RNA_SEQ (sum across merged protein-coding exons per gene) - All backgrounds unsigned (abs magnitude) for quantile ranking Pre-loads GENCODE v48 gene annotations and builds spatial index for fast exon lookup within prediction windows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…haul Analysis framework: - PerTrackNormalizer with per-track CDFs (effect, activity, perbin) for all 6 oracles - Auto-download backgrounds from HuggingFace on oracle load - AnalysisRequest dataclass preserves user's original prompt on every report - Magnitude-gated interpretation labels ("Very strong" requires |effect| > 0.7) - Top-10-per-layer cap in markdown reports with truncation footer - Biological interpretation + suggested next steps on all 14 example outputs - Literature caveats where oracle predictions diverge from published biology Bug fixes: - Sequence.slice() missing self argument in interval.py (broke Enformer predictions) - oracle_name="oracle" placeholder in region_swap, integration, discovery - Cell-type column bloat in batch_scoring and causal (thousands of cell types) - Corrupted igv.min.js in 15 HTML files from misplaced injection into JS string - predict() called with string region instead of (chrom, start, end) tuple MCP server: - All 8 critical tools accept user_prompt and forward it into reports - _safe_tool decorator returns structured {"error", "error_type"} on failure - Improved docstrings: score_variant_batch (variant dict schema), discover_variant_cell_types (runtime + cell count), fine_map_causal_variant (composite formula + output columns) - Causal table shows Top Layer column; batch scoring resolves track IDs to human-readable names Application examples (14 folders, all with MD/JSON/TSV/HTML): - Regenerated all variant_analysis, validation, discovery, causal, batch, sequence_engineering - Every report has Analysis Request header + Interpretation section - Cleaned stale intermediate files (5 removed) - IGV browser verified working in headless Chrome Documentation: - README: "Start here" applications callout, updated MCP tools list, MCP walkthrough link - API_DOCUMENTATION: application layer section (all 6 functions + AnalysisRequest) - MCP_WALKTHROUGH.md: 5 example conversations showing natural-language usage - Natural-language framing notes on all 7 category READMEs - Fixed AlphaGenome HF URL, clarified environment.yml vs chorus-base.yml - Notebook install banners for comprehensive/advanced (all 6 oracles required) Scripts: - Internal scripts moved to scripts/internal/ - regenerate_examples.py + regenerate_remaining_examples.py for reproducible output generation - scripts/README.md updated with public script descriptions Testing: - 268 tests passed (including new magnitude-gate and causal-table tests) - All 3 notebooks executed end-to-end (Enformer, all 6 oracles, multi-oracle analysis) - IGV browser rendering verified via Selenium in headless Chrome - MCP server startup verified Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These are replaced by the per-oracle build_backgrounds_*.py scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove stale outputs containing machine-specific paths (/Users/lp698/...) and runtime-specific logs. Notebooks should be committed clean so new users run them fresh in their own environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace /PHShome/lp698/chorus with REPO_ROOT (computed from __file__) in all 8 public scripts (6 build_backgrounds + 2 regenerate) - Clear stale notebook output cells containing machine-specific paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ll traceability Batch scoring: - Per-track columns (one per assay:cell_type) with raw score + percentile - track_scores dict preserved on BatchVariantScore for programmatic access - display_mode parameter: "by_assay" (default), "by_cell_type", "summary" - Track ID footnotes for tracing back to oracle data - oracle_name parameter fixes normalizer CDF lookup (was returning None) Causal prioritization: - Per-track columns replacing generic "Max Effect / Top Layer" - Each cell shows raw effect + percentile for each scored track - track_scores dict on CausalVariantScore Report infrastructure: - report_title field: "Region Swap Analysis Report", "Integration Simulation Report" - modification_region: IGV highlights full replaced/inserted region (not 2-3bp) - modification_description: documents what was inserted/replaced and its length - has_quantile scoping fix (UnboundLocalError on empty allele_scores) All examples regenerated with biologically specific tracks: - SORT1: HepG2 DNASE + CEBPA + CEBPB + H3K27ac + CAGE (reproduces Musunuru) - BCL11A: K562 DNASE + GATA1 + TAL1 + H3K27ac + CAGE (reproduces Bauer) - FTO: HepG2 tracks (nearest metabolic cell type available) - TERT: K562 tracks - Validation: forced HepG2 CEBP tracks matching the AlphaGenome paper Every report carries the user's original prompt (Analysis Request block). All 13 examples verified: MD + JSON + TSV + HTML, prompt present, 268 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…racle table - Remove ChromBPNet loading and "Combining oracles" sections (belong in main README) - Rename "Window" to "Output window" + add Resolution column - Separate Effect percentile and Activity percentile explanations - Add recommendation to start with AlphaGenome - Remove Python API details (get_normalizer) that don't belong here Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- .mcp.json: drop the /data/pinello/... PATH hardcoding so new users can use the file as-is from `curl` in any environment. mamba resolves the chorus env without an explicit PATH override. - README.md: add LDlink token setup section under Troubleshooting — fine_map_causal_variant auto-fetch path was silently failing for users without a free LDlink API key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Changes informed by a fresh walkthrough of the README from the perspective of a brand-new user: - Reorder Installation: Fresh Install now comes before Upgrading (a first- time reader no longer sees "remove existing envs" before they install) - Consolidate the Fresh Install block to cover env create, pip install, chorus setup --oracle enformer, and chorus genome download hg38 so a user copying the block ends up actually ready to run the Quick Start - Clarify that the root environment.yml is what you install and the per-oracle YAMLs in environments/ are internal to `chorus setup` - Quick Start: point to examples/single_oracle_quickstart.ipynb for users who prefer a notebook, and call out the setup prerequisite explicitly - Annotate the ENCFF413AHU track ID in the DNase snippet so users know what it is before the Discovering Tracks section explains it - HF_TOKEN: note that Claude Code inherits env from the shell where `claude` is started (the MCP server is spawned by that shell) - Add a "Further reading" section linking the docs/ folder — previously API_DOCUMENTATION, METHOD_REFERENCE, VISUALIZATION_GUIDE, and IMPLEMENTATION_GUIDE were all invisible to a README-only reader - Remove REQUIREMENTS_CHECKLIST.md (internal audit scratch file) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gitignore Real issues caught by a deeper audit pass and fixed: - **Duplicate CAGE column headers**: batch scoring tables rendered two "CAGE:HepG2" columns because both + and - strand tracks have identical description fields. _track_display_name now appends (+) / (-) when the assay_id carries a strand suffix, producing unique column labels in markdown, HTML, and DataFrame outputs. - **UnboundLocalError in _build_html_report**: has_quantile / has_baseline were defined inside a for-loop that doesn't execute for empty allele_scores, causing .to_html() to crash on minimally-populated reports. Initialise both before the loop (matches the markdown fix). - **docs/RELEASE_CHECKLIST.md**: internal QA checklist with stale metrics (references 128 tests when we have 280). Removed — internal docs shouldn't live in user-facing docs/. - **API_DOCUMENTATION / METHOD_REFERENCE overlap**: added reciprocal callouts clarifying that API_DOCUMENTATION is authoritative and METHOD_REFERENCE is a one-line cheat sheet. - **logs/ not ignored**: 82 MB of run logs at risk of being committed. Added to .gitignore. Test coverage added (+12 tests, 268 → 280 passed): - TestReportMetadataFields: report_title, modification_region, modification_description rendering in MD / HTML / dict - TestBatchDisplayModes: by_assay, by_cell_type, track-ID footnote, CAGE strand disambiguation in DataFrame columns - TestSafeToolDecorator: passthrough, exception → error dict, function name preservation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Columns and TSV headers now show CAGE:HepG2 (+) and CAGE:HepG2 (-) instead of two identical CAGE:HepG2 columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

A new user could miss this entirely — the previous mention was a single feature bullet ("auto-downloaded from HuggingFace") with no detail. This section spells out: - The backgrounds turn raw log2FC into the effect/activity percentiles shown in every report - They're fetched on first oracle use from the public HF dataset lucapinello/chorus-backgrounds and cached in ~/.chorus/backgrounds/ - File sizes per oracle (so users with limited disk know what to expect) - **No HF_TOKEN required** for backgrounds (only AlphaGenome model is gated) - LDlink token is separate and only needed for causal auto-fetch - Optional pre-download snippet for users who want to avoid the first-use wait Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a comprehensive reference appendix covering: - What the backgrounds are (effect %ile vs activity %ile vs per-bin) and why they exist (turn raw log2FC into genome-aware metrics) - How they were calculated: * Variant effect distribution: 10K random SNPs × all tracks with layer-specific scoring formulas (log2FC, logFC, diff) * Activity distribution: ~31.5K positions (random intergenic + ENCODE SCREEN cCREs + protein-coding TSSs + gene-body midpoints) * Per-bin distribution: 32 random bins per position for IGV scaling * RNA-seq exon-precise sampling rule * CAGE summary routing rule - Sample sizes per oracle (track count, samples per track, NPZ size) - Python API usage with verified signatures (get_pertrack_normalizer, download_pertrack_backgrounds, effect_percentile, activity_percentile, perbin_floor_rescale_batch) - MCP / Claude usage (auto-attached, zero-config) - Documented ranges and a sanity-check rule of thumb for interpretation - How to reproduce or extend the backgrounds via build_backgrounds_*.py All function signatures in the appendix were verified against the actual implementation before committing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Local-only IDE/agent state (settings, scheduled tasks lock) — per- developer, not for the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pies - AUDIT_PROMPT.md: systematic end-to-end audit script for a new machine, with REPLACE_* placeholders for HF_TOKEN and LDLINK_TOKEN. - .gitignore: block any *_WITH_TOKENS.md or AUDIT_PROMPT_WITH_TOKENS* file from ever being staged, since filled-in copies contain secrets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Records what worked and what did not on a fresh macOS 15.7.4 / arm64 clone of chorus-applications: full install, all 6 oracle smoke-predicts, 286/286 pytest pass, 3 example notebooks (0 errors), 22 MCP tools registered, 6/6 application tools producing correct outputs (rs12740374 SORT1 case reproduces the published Musunuru-2010 finding), 18/19 application HTML reports IGV-verified via headless Chrome, ChromBPNet smoke build completed end-to-end on CPU. Top issues a macOS user hits, ranked, with one-or-two-line fixes: 1. No Apple GPU (MPS / Metal / jax-metal) auto-detect — frameworks are installed but borzoi/sei/legnet only check torch.cuda.is_available(), SEI hardcodes map_location='cpu', chrombpnet/enformer envs lack tensorflow-metal. Verified Borzoi runs on MPS in 4.3 s when forced. 2. SEI Zenodo download via stdlib urllib at ~80 KB/s — 3.2 GB tar takes ~11 h. curl -C - -L recovers it in ~30 min. 3. fine_map_causal_variant rsID-only crash (KeyError 'chrom' at causal.py:355). Workaround: pass "chr1:pos REF>ALT" form. 4. Two-mamba-installs MAMBA_ROOT_PREFIX gotcha breaks chorus health. 5. Notebooks need explicit `python -m ipykernel install --user --name chorus`. 6. SEI download has no single-flight lock — concurrent inits race. Verdict: production-ready with caveats. None of the issues block correctness; all are operational or one-line code fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Addresses every actionable item in audits/2026-04-14_macos_arm64.md. All changes are platform-conditional — Linux CUDA paths are unchanged. PyTorch oracles (borzoi, sei, legnet) — auto-detect MPS on Apple Silicon - Both the in-process loader (chorus/oracles/{borzoi,sei,legnet}.py) and the subprocess templates ({borzoi,sei,legnet}_source/templates/{load, predict}_template.py) now resolve `device is None` (or the new 'auto' sentinel) as: cuda > mps > cpu. Linux + CUDA box hits the cuda branch first, no behavior change there. - SEI: replaced the hard `map_location='cpu'` device pin (the value is still used to load weights to host memory before .to(device), which is the standard pattern across torch versions and works for mps too). - Sei BSplineTransformation lazily moved its spline matrix only when `input.is_cuda`. Generalized to any non-CPU device so the matmul works on MPS as well. Verified: 286/286 pytest still pass. TensorFlow oracles (chrombpnet, enformer) — Metal backend on Apple Silicon - chorus/core/platform.py macos_arm64 adapter now adds `tensorflow-metal>=1.1.0` to pip_add. Once installed, Apple's plugin registers a 'GPU' physical device, so the oracles' existing tf.config.list_physical_devices('GPU') auto-detect picks it up with no code change. Linux paths don't see the macos_arm64 adapter so CUDA stays intact. JAX oracle (alphagenome) — unchanged - Already explicitly skips Metal in auto-detect (jax-metal still missing `default_memory_space` for AlphaGenome). README updated to document this trade-off. MCP fix — fine_map_causal_variant rsID-only crash - Calling `fine_map_causal_variant(lead_variant="rs12740374")` previously raised KeyError: 'chrom' at chorus/analysis/causal.py:355 because `_parse_lead_variant("rs12740374")` returns {"id": ...} only. - Backfill chrom/pos/ref/alt onto the sentinel from the LDlink response (which always carries them) before invoking prioritize_causal_variants. - Verified end-to-end: rs12740374 ranked #1 with composite=1.000 of 12 LD variants on AlphaGenome (matches the published Musunuru-2010 finding). SEI Zenodo download — chunked + resume + single-flight lock - Replaced urllib.request.urlretrieve with a stdlib chunked urlopen loop that supports HTTP Range resume and an fcntl exclusive lock so two concurrent SeiOracle inits don't race the same partial file. Original observed throughput on macOS was ~80 KB/s (would take ~11 hours for the 3.2 GB tar); the new path resumes interrupted downloads and progress- logs every 100 MB. README — macOS troubleshooting + Apple GPU policy table + kernel install - Documented the two-mamba-installs MAMBA_ROOT_PREFIX gotcha that breaks `chorus health` when the new chorus env lands in a different mamba root than the per-oracle envs. - Added the per-oracle macOS GPU support matrix (MPS / Metal / CPU) with explicit `device=` examples. - Added the missing `python -m ipykernel install --user --name chorus` step to Fresh Install so examples/*.ipynb find the chorus kernel. Validation on macOS 15.7.4 / Apple Silicon (CPU + MPS + Metal): - 286/286 pytest pass (incl. all 6 oracle smoke-predict tests) - chorus.create_oracle('borzoi') auto-detects mps:0 - chorus.create_oracle('sei') auto-detects mps:0 + smoke-predict ok - chrombpnet env now reports tf.config.list_physical_devices('GPU') = [GPU:0] - fine_map_causal_variant(lead_variant='rs12740374') ranks rs12740374 composite=1.000 of 12 LD variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…EI resumable download, rsID backfill) Verified on Linux CUDA: 285/285 code tests pass. Alphagenome smoke errors due to expired HF token in chorus-alphagenome env (unrelated to PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tests/test_mcp.py: TestFineMapRsidBackfill verifies fine_map_causal_variant backfills chrom/pos/ref/alt when a caller passes only an rsID lead_variant. Regression test for the macOS audit crash (KeyError: 'chrom'). - examples/*.ipynb: Re-executed all three notebooks end-to-end on Linux CUDA to refresh outputs against the merged audit branch. Full suite now: 286/286 tests pass (including alphagenome real-oracle smoke). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regeneration on Linux CUDA GPU 1 (GPU 0 was full): - AlphaGenome variant + validation (5 examples) — 28 min - Remaining AlphaGenome (batch/causal/discovery/seq) — ~14 min - Enformer SORT1 — 1 example - ChromBPNet SORT1 — 1 example Discovery HTML filenames now use oracle_name "alphagenome" (was placeholder "oracle"). Verification: - 289/289 tests pass across combined runs (6 oracle smoke tests green on GPU 1) - Selenium screenshot sweep: 18/19 HTML render cleanly; 1 "NO-IGV" is the batch_scoring HTML which is a scoring table by design (no browser view) - Hardcoded /PHShome/lp698/chorus paths in notebook log outputs redacted to /path/to/chorus .gitignore: ignore examples/applications/**/*_screenshot.png so selenium artifacts don't pollute the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lper Adds audits/2026-04-15_macos_arm64_post_merge.md — the second-pass end-to-end audit on a fully wiped + fresh-cloned install, after the PR #7 macOS-support changes were merged. Every fix from v1 is confirmed working on a clean setup: * chrombpnet + enformer envs now pull in tensorflow-metal automatically → `Auto-detected 1 GPU(s) … name: METAL` * borzoi/sei/legnet auto-detect mps:0 * fine_map_causal_variant("rs12740374") rsID-only returns rs12740374 composite=0.963 of 12 LD variants (was KeyError in v1) * analyze_variant_multilayer reproduces Musunuru-2010 biology (CEBPA strong binding gain +0.37, DNASE strong opening +0.43) * 286/286 pytest, 0 notebook errors, 19/19 IGV reports ok Two download-reliability findings surfaced on this clean run (both pre-existing, both same bug-class as the SEI fix that landed in PR #7): chorus/utils/genome.py stalled at 36% of the hg38 download, and chorus/oracles/chrombpnet.py has no single-flight lock so two concurrent callers race the ENCODE tar and hit EOFError. This commit also adds chorus/utils/http.py — the resume+lock helper that previously lived inside SeiOracle, now extracted as a shared stdlib-only utility so genome + chrombpnet can reuse it. The sei.py helper shim keeps the old public API working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…elper Three call sites fetch large files from the public internet with plain urllib.request.urlretrieve (no resume, no concurrency lock). The 2026-04-15 v2 audit on a fresh install hit two of them the hard way: UCSC cut the hg38 connection at ~36% of the 938 MB download (urllib.error.URLError: retrieval incomplete: got only 363743871 out of 983659424 bytes), and two concurrent callers of _download_chrombpnet_model raced the same partial ENCODE .tar.gz so one read it mid-write and hit EOFError: Compressed file ended before the end-of-stream marker was reached inside tarfile.extractall. Re-use the resume+lock helper introduced for SEI in PR #7, lifted into chorus/utils/http.py in the preceding commit: chorus/oracles/sei.py _download_with_resume staticmethod becomes a thin shim that forwards to chorus.utils.http.download_with_resume. No behaviour change and no API break. chorus/utils/genome.py GenomeManager.download_genome swaps urllib.request.urlretrieve for download_with_resume. Fixes the UCSC stall observed in the v2 audit; partial .fa.gz is now resumable across retries. chorus/oracles/chrombpnet.py _download_chrombpnet_model (ENCODE tar) and _download_jaspar_motif (JASPAR motif) both route through download_with_resume. The fcntl lock on <dest>.lock serialises concurrent callers so the pytest smoke fixture and a background build_backgrounds_chrombpnet.py job can no longer corrupt each other's download. All three changes are platform-agnostic; the helper is stdlib-only (urllib + fcntl). Linux CUDA is not touched. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…S audit v2 audit confirmed post-merge macOS works end-to-end (286/286 tests, 19/19 HTML, reproduces Musunuru-2010 biology, rsID backfill verified). Adds chorus/utils/http.py (resume + fcntl-lock) and routes hg38 genome + chrombpnet ENCODE tar + JASPAR motif downloads through it. SEI helper becomes a shim for backward-compat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Third-pass audit going one level deeper than v1 (pre-merge smoke) and v2 (post-merge fresh install). Scope: 14 application examples × 19 HTML reports × 6 per-track normalizer NPZs + the scoring/normalization stack. Read-only deliverable — the fixes identified here belong in a separate follow-up PR after review. Findings (5, ranked by severity): 1. HIGH — chrombpnet_pertrack.npz:DNASE:hindbrain has 0 background samples and an all-zeros CDF. PerTrackNormalizer.effect_percentile() silently returns 1.0 for every raw_score (including 0.0) because np.searchsorted on a zeros row ranks everything at the end, and _get_denominator falls through to cdf_width=10000 when counts[idx]=0. Same bug class as the v2 concurrent-download race that landed in PR #8 — the hindbrain model download failed silently and left a zero-count reservoir. Impact: any variant scored against DNASE:hindbrain in ChromBPNet gets a false "100th percentile". 2. MEDIUM — every committed HTML report loads igv.min.js from cdn.jsdelivr.net at view time. 2/19 reports flaked on net::ERR_CERT_AUTHORITY_INVALID during this audit; any user behind a corporate proxy / airgapped network / jsdelivr outage will see IGV silently fail with no fallback. No SRI either. 3-5. LOW — documentation improvements: - TERT_promoter example doesn't caveat that C228T's published biology is melanoma-specific; K562 result (all negative) is correctly modelled but reads as "no effect" without context - AlphaGenome DNASE vs ChromBPNet ATAC disagree on rs12740374 direction in HepG2 (+0.45 vs -0.11); no application note teaches this real cross-oracle divergence - HBG2_HPFH footer notes BCL11A/ZBTB7A catalog absence; could be tightened Normalization stack verified clean: - CDF monotonicity: 0 bad rows across 18,159 tracks × 10,000 points - signed_flags match LAYER_CONFIGS.signed exactly (AG 667 RNA-seq, Borzoi 1543 stranded RNA, SEI 40/40 regulatory_classification, LegNet 3/3 MPRA; Enformer 0 signed is correct — no RNA-seq) - Build-vs-scoring window_bp bit-identical via shared LAYER_CONFIGS - Pseudocount/formula: _compute_effect reproduces reference implementation with diff=0.0 across all test cases - perbin_floor_rescale_batch math verified at all edges - Edge cases: unknown oracle → None, unknown track → None, raw=0 → 0.0, raw=huge → 1.0 Phase A rerun on 4 AlphaGenome literature-checked cases (SORT1, TERT, FTO, BCL11A) confirms biology is preserved but results are NOT bit- identical — raw_score drift ~1-2% on dominant tracks, larger quantile swings on near-zero tracks due to AlphaGenome's JAX CPU non- determinism. No committed example is stale. Noise-floor handling for |raw_score| < ~1e-3 added to follow-up recommendation list. Artifacts: - audits/2026-04-16_application_and_normalization_audit.md (main report) - audits/2026-04-16_screenshots/*.png (19 full-page PNGs) - audits/2026-04-16_data/*.json (per-app cards + normalization/selenium/rerun JSON) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ixes Addresses the findings in audits/2026-04-16_application_and_normalization_audit.md (PR #9). Three categories of change: 1. Delete two example applications the audit recommends removing: - examples/applications/variant_analysis/TERT_promoter/ C228T is a melanoma-specific gain-of-function mutation; the example runs it in K562 (erythroleukemia) and shows all-negative effects. The biology is correct for the model but inverts the published direction. Rather than add a "wrong cell type" caveat, drop the example — SORT1 / FTO / BCL11A cover variant_analysis without teaching the reader a misleading result. - examples/applications/validation/HBG2_HPFH/ Already self-documented as "Not reproduced" in validation/README.md: BCL11A / ZBTB7A aren't in AlphaGenome's track catalog, so the repressor-loss mechanism isn't visible. Keeping a "validation failed" example alongside the working SORT1_rs12740374_with_CEBP confuses readers. Drop it. Also updated: root README.md (replaces HBG2_HPFH link with SORT1_rs12740374_with_CEBP), examples/applications/variant_analysis/README.md (drops TERT prompt + section), examples/applications/validation/README.md (drops HBG2 row + section + reproduce snippet), scripts/regenerate_examples.py + scripts/internal/inject_analysis_request.py (both lose their TERT_promoter/HBG2_HPFH entries). 2. Normalizer: guard against zero-count CDF rows (chorus/analysis/normalization.py). Audit finding #1 (HIGH): the committed chrombpnet_pertrack.npz has DNASE:hindbrain with effect_counts[idx] == 0 and a zero-filled CDF row. effect_percentile() / activity_percentile() silently returned 1.0 for every raw_score (including 0.0) because np.searchsorted on a zeros row returns len(row) for any non-negative probe and the denominator falls through to cdf_width. Same bug-class as the v2 chrombpnet concurrent-download race that landed in PR #8 — the hindbrain ENCODE tar must have failed to extract cleanly during the original background build. New private helper _has_samples() returns False when counts[idx] == 0, which makes _lookup / _lookup_batch return None. Callers already render None as "—" in MD/HTML tables, so users now see "no background" instead of a silent false "100th percentile". Counts-less NPZs (older format, no counts field) are treated as valid — no regression. 3. Report: suppress quantile_score when raw_score is in the noise floor (chorus/analysis/variant_report.py). Audit finding #6 (LOW): when |raw_score| < 1e-3 the effect CDF is so densely clustered around 0 that a 1-2% raw-score drift can swing the quantile by 0.5+ (observed in the Phase A rerun: committed quantile=1.0 vs rerun=0.21 for a CEBPB track with raw_score ~1e-4). Set quantile_score = None in that regime so the HTML/MD tables render "—" and readers don't misread noise as signal. Threshold chosen conservatively to cover both log2fc (pc=1.0) and logfc RNA (pc=0.001) without hiding real effects. 4. IGV.js: lazy-download the bundle into ~/.chorus/lib on first use (chorus/analysis/_igv_report.py + chorus/analysis/causal.py). Audit finding #2 (MEDIUM): reports embed a <script src="..."> to cdn.jsdelivr.net that gets evaluated every time the HTML is opened in a browser. Any viewer on an airgapped network / corporate proxy that MITMs TLS / during a jsdelivr outage sees IGV silently fail (2/19 audit reports hit ERR_CERT_AUTHORITY_INVALID). The local- cache code path already existed but was opt-in (user had to drop a file in ~/.chorus/lib/igv.min.js manually). New _ensure_igv_local() helper runs on the first report generation and populates the cache via chorus.utils.http.download_with_resume (the helper that landed in v2 PR #8). Reports written after the first successful download inline the JS directly — self-contained HTML that opens anywhere without network. Download failure is logged at WARNING and the CDN <script> tag is used as fallback, preserving the current behaviour for anyone who can't reach jsdelivr at generation time. All changes are platform-agnostic; 287/287 pytest continue to pass; fix verified behaviourally: >>> norm.effect_percentile('chrombpnet', 'DNASE:hindbrain', 0.0) None # was: 1.0 >>> norm.effect_percentile('chrombpnet', 'DNASE:HepG2', 0.0) 0.0 # unchanged >>> ts = TrackScore(raw_score=0.0005, ...); >>> _apply_normalization(ts, ...); ts.quantile_score None # noise floor See audits/2026-04-16_application_and_normalization_audit.md (PR #9) for full context, per-app screenshots, and the Phase A / B / C methodology behind each finding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

From the 2026-04-16 new-user usability audit: HIGH: - H1: Fix wrong FTO coordinate in variant_analysis/README.md (53800954 → 53767042, matching all other references) - H2: Add @_safe_tool to all 22 MCP tools (was missing on 14; unhandled exceptions now return structured error dicts) - H3: Add htslib to environment.yml (provides bgzip for coolbox) - H4: Fix python_requires to >=3.10 in setup.py (code uses 3.10+ syntax like str | None) MEDIUM: - M2: Add README.md to marquee SORT1_rs12740374 example directory - M8: Harmonize discover_variant to accept alt_alleles: list[str] (was singular alt_allele, inconsistent with other discovery tools) - M9: Fix upgrade instruction order in README (remove oracle envs before removing the base chorus env that provides the CLI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Second-pass usability audit findings: 1. Badge colors now match interpretation labels (was: red "Minimal effect" badges because _score_color_class used percentile directly; now derives class from the interpretation string which applies raw-score gating) 2. IGV CHIP track names now show TF/mark (was: all "CHIP:HepG2"; now uses _track_description enrichment for "CHIP:CEBPA:HepG2" etc.) 3. Percentile display: ≥99th / ≤1st instead of "1.000" / "0.000" to avoid implying false precision when the background CDF is saturated (correct behavior — random SNPs mostly have near-zero effects) 4. getting_started() MCP prompt now recommends high-level tools first (analyze_variant_multilayer, discover_variant, score_variant_batch, fine_map_causal_variant) instead of only listing low-level primitives Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Second-pass usability audit cleanup: C5: Discovery sub-reports now carry AnalysisRequest with user prompt (regen script patches each per-cell-type report) C6: Borzoi targets file: strip /home/drk/tillage/datasets/human/ prefix from file column (upstream training paths, not used at inference) C7: HTML <title> now includes report_title + gene_name + position (e.g. "Multi-Layer Variant Report — SORT1 — chr1:109274968") P8: Remove scripts/internal/ from repo (8 machine-specific dev scripts) P9: Remove audits/2026-04-16_screenshots/ (~18 MB of PNGs) Both added to .gitignore to prevent re-commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BLOCKS_USER: - Add missing __init__.py to 5 oracle source/template dirs (borzoi, enformer, alphagenome templates) — non-editable pip install would fail with ModuleNotFoundError - Fix setup.py package_data: replace invalid ../environments/* escape with proper per-package data globs + data_files for env YAMLs CONFUSING: - Misspelled oracle name now raises ValueError listing valid names (was: misleading "not yet implemented" message) - LegNet cell_types in list_tracks: add WTC11 (was: only HepG2, K562) - list_tracks unknown oracle now lists valid names in error - README: fix AlphaGenome track count 5,930 → 5,731 (actual loaded) - README: fix Sei class count 41 → 40 - README: fix Borzoi track count 7,610 → 7,611 - README: fix oracle.list_tracks() → list_tracks() MCP tool Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cherry-picked from audit/2026-04-16-fresh-install-v4 (preserving our second/third-pass audit fixes which that branch didn't have). The macOS CPU-forcing guard in AlphaGenome's load and predict templates only fired when device was None or started with "cpu". Callers passing device='cuda:0' bypassed it, letting jax-metal initialize and crash with "UNIMPLEMENTED: default_memory_space". Fix: on Darwin, always force JAX_PLATFORMS=cpu unless the caller explicitly requests Metal. Applied to all three code paths: - alphagenome.py:_load_direct (env var set before import jax) - load_template.py - predict_template.py Includes macOS v4 audit report: clean-slate install, 6 oracle GPU verification, 12 example regenerations, 3 notebooks (0 errors), 13 HTML Selenium checks, 7-check normalization audit (all pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Walked through Chorus as 4 personas (clinician, bioinformatician, PhD student, computational biologist). Key changes: For clinicians: - Add "Key terms" glossary box at top of README (oracle, track, assay_id, effect percentile, log2FC — defined before first use) - Reword Start-here table: "I have a variant but don't know the relevant tissue" instead of bioinformatics framing - Add Interpretation sections to SORT1, BCL11A, FTO example outputs with clinical/biological narrative (LDL cholesterol, sickle cell, tissue-specificity explanation) For bioinformaticians: - Add VCF parsing snippet to batch scoring README - Document oracle_name param for normalization - Note AlphaGenome full track IDs vs short names For contributors: - Replace Borzoi→mymodel throughout CONTRIBUTING.md (Borzoi is already implemented, was confusing) - Update Current Priorities to reflect all 6 oracles done Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

User feedback: batch scoring only showed the effect (log2FC + percentile) per track, hiding the absolute ref and alt values. A +0.4 effect could mean 10→14 (active region) or 0.001→0.0014 (noise) — impossible to tell without seeing both alleles. All three output formats now show full transparency per track: - TSV/DataFrame: columns _ref, _alt, _log2fc, _effect_pctile, _activity_pctile (was: _raw, _pctile, _activity) - Markdown: 4 sub-columns per track (Ref | Alt | log2FC | Effect %ile) - HTML: grouped header with colspan, same 4 sub-columns per track Also uses ≥99th / ≤1st percentile display from the earlier audit fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

3 new READMEs: - causal_prioritization/SORT1_locus/ - validation/SORT1_rs12740374_with_CEBP/ - validation/TERT_chr5_1295046/ 2 new interpretation sections: - SORT1_chrombpnet: notes cross-oracle comparison with AlphaGenome - SORT1_enformer: notes cross-tissue DNASE pattern + 114kb window Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All 12 application examples regenerated on GPU 0 with: - AlphaGenome: 4 variant + 1 validation + batch + causal + discovery + 2 seq engineering + TERT validation (28 min + 15 min) - Enformer: 3 SORT1 examples (5 min) - ChromBPNet: 1 SORT1 example (2 min) New in regenerated outputs: - ≥99th / ≤1st percentile display (no more "1.000") - Badge colors match interpretation labels in HTML - IGV CHIP track names show TF/mark (CHIP:CEBPA:HepG2) - HTML titles include report_title + gene + position - Self-contained igv.min.js (no CDN dependency) - Batch scoring TSV: expanded ref/alt/log2fc/pctile columns Re-added interpretation sections to 5 variant analysis examples (SORT1, BCL11A, FTO, SORT1_chrombpnet, SORT1_enformer) after regeneration overwrote them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Root cause: discover_variant_effects() writes HTML internally BEFORE analysis_request is patched in regen scripts. Fixed by re-writing HTML after patching, targeting the exact filenames. Removed 6 stale HTML files: - 3 discovery sub-reports (per-cell-type) that duplicated the main discovery report but lacked user prompt - 1 orphaned CELSR2 validation report (unreferenced) - 1 enformer validation duplicate (enformer_report.html, kept RAW_autoscale) - 1 enformer discovery duplicate in SORT1_enformer dir Final screenshot audit: 13/13 HTML reports CLEAN — all have Analysis Request with user prompt, correct badge colors, enriched CHIP track names, self-contained IGV, and ≥99th percentile display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… docstring From the read-only v5 audit (PR #12) — 3 low-severity findings: 1. MEDIUM: tests/test_analysis.py referenced old batch_scoring column names ('_raw', '_pctile') that commit 01d8446 renamed to '_ref', '_alt', '_log2fc', '_effect_pctile', '_activity_pctile'. Updated assertions to match current scheme; added checks for _ref and _alt that weren't previously verified. 279 → 281 passing. 2. MEDIUM: scripts/regenerate_examples.py hardcoded "N HepG2/K562 tracks" regardless of actual cell type. Added "cell_type" to each of the 4 AlphaGenome variant examples, and the tracks_requested string is now derived from that field. Also mechanically patched the 4 affected committed example outputs (SORT1, BCL11A, FTO, SORT1_CEBP) in MD/JSON/HTML so readers see the correct per-example label (e.g. "6 K562 tracks" for BCL11A) without waiting for the next regen. All 12 occurrences removed. 3. MINOR: chorus/analysis/batch_scoring.py:82 docstring still listed the old _raw / _pctile columns. Updated to reflect current output schema. Verified: - pytest tests/ --ignore=tests/test_smoke_predict.py → 281 passed - Selenium re-render of the 4 patched HTMLs confirms the new labels appear and no "HepG2/K562" stragglers remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…hrough From the post-v5-merge audit: C1: MCP server tracks_requested now derives cell-type label dynamically (matches regen script output). Added _describe_tracks_requested helper that inspects the variant_result to extract cell types; labels uniform cell-type sets as "N HepG2 tracks" and mixed as "N tracks". Applied to all 5 tools using the pattern. C2/C3: Added 15 new tests covering _fmt_percentile (≥99th boundary), _score_color_class (interpretation-label-based color), resolved device detection (nvidia-smi probe), and _describe_tracks_requested (uniform vs mixed cell-type labels). Test count 235 → 250. P2: MCP_WALKTHROUGH.md — added "Manage loaded oracles" tip covering oracle_status and unload_oracle. Fixed stale 5,930 → 5,731 track count for AlphaGenome. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Root cause (2 layers): 1. discover_variant_effects and discover_and_report wrote HTML to output_path BEFORE AnalysisRequest could be attached (since the functions didn't accept one). Regen scripts then attached the prompt post-hoc and did a second report.to_html() under a pretty filename, leaving the first HTML behind as an orphan. 2. Commit df7d613 deleted the 3 per-cell-type discovery HTMLs from the repo, but those files are the actual drill-down output of discover_and_report — they were mislabeled as duplicates. There is no 'main' discovery HTML; the per-cell-type HTMLs are it. Clean fix (no glob+remove hack): - discovery.py: discover_variant_effects gains `analysis_request` and `output_filename` kwargs; discover_and_report gains `user_prompt` and `tool_name` kwargs. When provided, the AnalysisRequest is attached before the first HTML write. - regenerate_examples.py::regenerate_enformer_discovery and regenerate_remaining_examples.py::{regen_discovery, regen_tert_chr5} now pass the AnalysisRequest + pretty filename in. The post-hoc report.to_html() rewrite and the per-cell-type for-loop that patched analysis_request by glob are removed. - The 3 legitimate discovery HTMLs are re-committed to the repo with the user prompt baked in on first write. - tests/test_analysis.py: 2 new tests verify the new kwargs exist. Verified: - pytest 298 passed (was 296) - SORT1_enformer dir contains only rs12740374_SORT1_enformer_report.html (no orphan chr*.html) - All 3 discovery sub-reports render the "Screen all cell types…" prompt in their Analysis Request section - `git status` after a fresh regen is clean (no untracked files) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Read-only audit from a first-time user's perspective. The install path, Minimal Working Example, MCP walkthrough, and CLI all work out of the box; four low-to-medium polish items identified, all documentation drift: - MEDIUM: `chorus list` shows phantom `base` oracle as "✗ Not installed" because it scans environments/chorus-*.yml including chorus-base.yml (which the install instructions tell users NOT to install directly). - MEDIUM: docs/MCP_WALKTHROUGH.md:38 uses wrong kwarg `alt_allele="T"` (singular string); actual signature is `alt_alleles: list[str]`. The next example in the same file uses the correct form. - LOW: examples/advanced_multi_oracle_analysis.ipynb cell 1 has stale subtitle "using the Enformer oracle" copy-pasted from single_oracle_quickstart.ipynb (this is the multi-oracle NB). - LOW: SORT1_rs12740374/README.md "Key results" table shows percentiles 99/98/95/90/88 but example_output.md has all five tracks at ≥99th after the v5 _fmt_percentile update. Verified clean: README Minimal Working Example runs verbatim; all 10 MCP tools introspect with docstrings + user_prompt; error messages (`Unknown oracle`, `Valid names`, etc.) are actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four polish fixes from the first-user UX audit (PR #16): 1. MEDIUM: `chorus list` no longer shows a phantom `base` entry. EnvironmentManager.list_available_oracles() now filters out chorus-base.yml (an internal template — the user-facing base env is 'chorus' from root environment.yml). Added a guard in `chorus setup --oracle base` that prints a helpful message pointing to `mamba env create -f environment.yml` instead of silently trying to create a chorus-base env. Test added in tests/test_core.py. 2. MEDIUM: docs/MCP_WALKTHROUGH.md:38 — fixed `alt_allele="T"` (wrong, singular string) → `alt_alleles=["T"]` (plural, list) to match the actual MCP tool signature. The adjacent Example 2 on line 70 already used the correct form; now both match. 3. LOW: examples/advanced_multi_oracle_analysis.ipynb cell 1 — replaced the stale "using the Enformer oracle" subtitle (copy-pasted from single_oracle_quickstart) with a proper multi-oracle description that matches the title. 4. LOW: examples/applications/variant_analysis/SORT1_rs12740374/ README.md — "Key results" table had stale graduated percentiles (99/98/95/90/88); current example_output.md has all five tracks at ≥99th after the v5 _fmt_percentile update. Refreshed the table with current effect sizes and enriched track names (CHIP:CEBPA:HepG2 etc.), added an explanation of why the top bucket shows as "≥99th". Verified: pytest 299 passed (was 298); `chorus list` output no longer includes the phantom 'base' entry; `chorus setup --oracle base` prints the friendly error message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Clean-slate audit after v7 UX fixes merged. Deleted 13 GB (7 mamba envs, ~/.chorus/, HF chorus models) before starting; preserved only hg38.fa and the repo-internal ChromBPNet/Sei/LegNet weights. First full audit with zero findings. Every v5/v6/v7 fix held up: - v5: batch_scoring columns + cell-type label + docstring → live - v6: discovery API kwargs (analysis_request, output_filename, user_prompt) → live and verified zero-orphan after fresh regen (git status --short | grep ^?? returns empty) - v7: phantom `base` filter + walkthrough alt_alleles + NB3 subtitle + SORT1 README ≥99th percentiles → live Results: - pytest: 299/299 passed on fresh base env (17.7 s) - smoke: 6/6 oracles pass in 6 min 1 s (AG+Borzoi re-downloaded from HF) - regen: all 12 examples reproduce within AG CPU non-det tolerance (max Δeff 0.036, ChromBPNet 0.0001, Enformer 0.000) - notebooks: 129 code cells across 3 NBs, 0 errors, 0 warnings, 0 stale - HTML: 16/16 reports audit clean in Selenium (0 SEVERE, 0 CDN) - CDFs: 4/6 downloaded, all pass monotonicity + counts + p50/p95/p99 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Full cold-start audit at /srv/local/lp698/chorus-audit-v7: - All 7 chorus envs wiped and rebuilt from scratch via chorus setup - hg38 + HF backgrounds re-downloaded fresh - 307/307 tests pass (301 code + 6 oracle smoke) - 3 notebooks: 0 errors across 235 cells - All 13 examples regenerated (AG + Enformer + ChromBPNet) - Selenium: 16/16 HTML reports CLEAN Includes 2026-04-17_v7_scorched_earth_audit.md report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Screenshot review found causal prioritization MD+HTML still used the old '(100%)' / '(95%)' percentile format. The variant_report tables and batch_scoring tables already used '≥99th' / '0.95' / '≤1st' via _fmt_percentile (added in the audit pass). Causal was the last remaining site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two scary-looking warnings surfaced while reading notebook cell outputs in the v7 audit. Neither is a real problem but both alarm users: 1. chorus/core/base.py:323 — case-sensitive compare of reference allele vs genome. pyfaidx returns lowercase for softmasked (repetitive) regions; users always pass uppercase. The previous code fired 'Provided reference allele is not the same as the genome reference' on every variant in a softmasked locus (e.g. GATA1 TSS in quickstart notebook cell 39, comprehensive notebook cells 35 and 51). Now uses .upper() on both sides; also includes the actual allele pair in the warning message so users can confirm. 2. chorus/core/result.py:104 — 'Unknown implementation' warning fired for every Sei track (Stem cell / Multi-tissue / H3K4me3 etc.) that isn't in the hardcoded assay_type registry. The generic fallback works correctly; the warning was just noise. Downgraded to logger.debug. Scientific review of outputs: - SORT1 rs12740374: predictions match Musunuru 2010 mechanism (CEBPA/B binding gain, DNASE opening, H3K27ac gain, CAGE TSS increase) ✓ - BCL11A rs1427407: TAL1 binding loss + DNASE closing in K562 ✓ - FTO rs1421085: minimal effects in HepG2 (expected — adipose tissue) ✓ - TERT chr5:1295046 T>G: E2F1 binding gain + TERT TSS CAGE increase ✓ - SORT1 causal: rs12740374 ranks #1 composite=0.964 ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ght CI v8 found zero action items but called out four scenarios it did not exercise. This PR closes them. Fast suite (299 → 303): tests/test_error_recovery.py - HF download ConnectionError → graceful return + warning log - download_with_resume .partial file resume via Range header - AlphaGenome missing HF_TOKEN → friendly actionable error - Missing oracle env → "chorus setup" hint + graceful fallback All four mocks, run in ~1.5 s, no network. Integration suite (gated by @pytest.mark.integration): tests/test_integration.py - SEI + LegNet CDF download from HF dataset (v8 didn't trigger these because no regen workflow uses sei/legnet) — verified the NPZs load and pass monotonicity + p50/p95/p99 + counts checks - ChromBPNet ATAC:K562 fresh download from ENCODE (~500 MB, 8 min) to a tmp dir — verifies the shared download_with_resume helper end-to-end without touching the 37 GB real cache - First E2E test of the MCP server: spawn chorus-mcp stdio subprocess via fastmcp Client, call list_oracles + load_oracle + analyze_variant_multilayer on SORT1 rs12740374 with real AlphaGenome predict (~4.5 min) Light CI: .github/workflows/tests.yml - Runs fast suite on every PR and push to main/chorus-applications - Linux ubuntu-latest + Miniforge + mamba + pip install -e . - Skips smoke tests (~10 GB models exceed runner disk) and integration tests (too slow for per-PR feedback) - workflow_dispatch for manual maintainer runs pytest.ini: registers the `integration` marker. Verified: - pytest -m "not integration" → 303 passed, 4 deselected (58.9 s) - pytest -m integration → 4 passed (13 min total) - Fresh ATAC:K562 tarball streamed with .partial + fcntl lock, fold 0 weights loaded into TF, 2114 bp predict returns finite values - chorus-mcp subprocess round-trips analyze_variant_multilayer response back to the client Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…unts Full sweep of user-facing docs after the multi-pass audit series. 11 fixes: BLOCKS_USER (wrong biology in example READMEs): - variant_analysis/SORT1_chrombpnet/README.md: was claiming "+0.441 Strong opening", actual is "-0.111 Moderate closing". Replaced with correct values + cross-oracle divergence explanation. - sequence_engineering/region_swap/README.md: entire scenario was wrong (promoter swap at chr1:1000500 vs actual SORT1 enhancer replacement at chr1:109274500). Rewrote from scratch to match actual example_output.md. - sequence_engineering/integration_simulation/README.md: wrong direction (DNASE -0.900 vs actual +4.22), wrong filename. Rewrote. - variant_analysis/SORT1_enformer/README.md: stale HepG2-focused numbers but actual is discovery-mode. Rewrote with top hits from current output. CONFUSING (stale numbers): - causal_prioritization/README.md:116: composite 0.898 → 0.964 - batch_scoring/README.md example table: old 4-column format → new per-track Ref/Alt/log2FC/Effect %ile format - causal SORT1_locus/example_output.md: (100%) → (≥99th) via in-place patch (percentile format now consistent with other reports) - validation/README.md:33,58: stale TERT CAGE +0.120 → +0.34 Track count drift (pick-one): - 5,930 / 5930 → 5,731 across docs/API_DOCUMENTATION.md, docs/variant_analysis_framework.md, README.md, examples/applications/ - 230 bp → 200 bp for LegNet input size across same files POLISH: - docs/IMPLEMENTATION_GUIDE.md tree: removed "# Placeholder" markers on borzoi/chrombpnet/sei (all implemented); added analysis/ and mcp/ directories; updated utils/ to reflect current files. - multilayer_variant_analysis.md: "Quantile range" header → "Effect percentile range" to match the rest of the repo's terminology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Full fresh-install audit at fbaef50 after wiping 13.2 GB (7 mamba envs, ~/.chorus/, HF chorus models). This pass focused on reading the actual content of every example output, not just error counts. Four findings, all low-to-medium environmental: 1. MEDIUM: TF Hub's /var/folders/.../tfhub_modules/ cache survives chorus teardowns. A stale partial download from a prior session made Enformer smoke fail with "'saved_model.pb' nor 'saved_model.pbtxt'". Clearing it fixes; README should document. 2. MEDIUM (regression from v8): On this audit's SSL-MITM'd network, stdlib urllib in download_with_resume fails on cdn.jsdelivr.net/igv.min.js with cert-verify error. 6/16 HTMLs landed on the CDN <script> fallback. huggingface_hub's httpx works through the same proxy — robust fix is to mirror igv.min.js on the HF chorus-backgrounds dataset and fall back there when urllib fails. 3. LOW: FTO README promises adipose tracks but example runs with HepG2 (documented in the prompt only, not the README). 4. LOW: Notebooks run via `jupyter nbconvert` without `mamba activate chorus` emit 20-60 `bgzip is not installed` ERROR lines per notebook from coolbox. bgzip IS in the env — PATH just isn't inherited. Plots render via in-memory fallback; user sees scary error spam. Verified: - 303/303 fast pytest (17.5 s) - 6/6 oracle smoke (after tfhub clear) - 12/12 examples regenerate, max Δeff 0.036 (CPU non-det) - 0 orphan HTMLs after parallel regen (v6 API fix live) - 3 notebooks execute, 0 errors, plots render (despite bgzip noise) - 16/16 HTMLs show correct biology in spot-check vs literature - 6/6 CDFs pass monotonicity/p50/p95/p99/counts (first audit to empirically verify sei + legnet CDFs; v9 integration test now automates) Biology confirmed on: SORT1 rs12740374 (Musunuru 2010 CEBP mechanism), BCL11A rs1427407 (TAL1 disruption in K562), TERT chr5:1295046 (E2F1 + CAGE activation), region_swap (enhancer removal closes chromatin), integration_simulation (CMV insertion opens chromatin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ip PATH All 4 findings from audit PR #20. Fast suite 303 → 308. 1. MEDIUM: TF Hub corrupt-cache recovery (enformer.py + load_template.py) When an earlier Enformer download was interrupted, tfhub_modules/ keeps a directory with no saved_model.pb. hub.load then raises with the bad path in the message. Added _load_enformer_with_tfhub_recovery that parses the error, wipes the bad dir, and retries once. Applied to both in-process (_load_direct) and subprocess (template) paths. README Troubleshooting now documents the manual workaround and the auto-recovery behavior. 2. MEDIUM (v8 regression): IGV JS fallback via huggingface_hub (_igv_report.py). On SSL-MITM networks stdlib urllib rejects the proxy cert and CDN fetch of igv.min.js fails. Added secondary fallback via hf_hub_download from lucapinello/chorus-backgrounds — huggingface_hub uses httpx+certifi which works through the same proxies that block urllib. Graceful no-op if the HF file doesn't exist yet (existing CDN <script> fallback still kicks in). Dataset upload of igv.min.js to the HF repo is a separate one-time task for the maintainer; until then this code path silently downgrades to the current behavior. 3. LOW: FTO README adipose claim vs HepG2 reality (examples/applications/variant_analysis/FTO_rs1421085/README.md). Rewrote the Tracks section to accurately state that the committed example uses HepG2 as a "nearest metabolic" proxy and shows what a no-signal call looks like. Included ready-to-use adipose-track assay_ids for users who want the biologically ideal run. 4. LOW: bgzip/tabix PATH when nbconvert skips mamba activate (chorus/__init__.py). Prepend sys.executable's bin/ to PATH at chorus import time. coolbox's subprocess calls to bgzip/tabix now succeed instead of emitting 20-60 ERROR lines per notebook and falling back to TabFileReaderInMemory. Cheap and idempotent (only prepended if not already present). Tests: 5 new in tests/test_error_recovery.py - test_corrupt_cache_is_cleared_and_retry_succeeds - test_unrelated_errors_propagate_unchanged - test_hf_fallback_when_cdn_fails - test_returns_none_when_both_fail - test_env_bin_on_path_after_import Verified: pytest -m "not integration" → 308 passed (was 303). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lucapinello and others added 30 commits March 25, 2026 13:39

Remove superseded background build scripts

b027af0

These are replaced by the per-oracle build_backgrounds_*.py scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regenerate batch scoring output with CAGE strand disambiguation

eb230fd

Columns and TSV headers now show CAGE:HepG2 (+) and CAGE:HepG2 (-) instead of two identical CAGE:HepG2 columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ignore .claude/ harness state

3527995

Local-only IDE/agent state (settings, scheduled tasks lock) — per- developer, not for the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

audits: reframe findings 3 and 5 as 'remove the example' per review

dfd6c54

lucapinello and others added 28 commits April 16, 2026 04:08

Merge v5 audit fixes: test column names + cell-type label + docstring

d796f01

Merge v6 discovery fix: orphan HTMLs + AnalysisRequest passthrough

88d1a45

Merge v7 UX audit report

7a1e563

Merge v7 UX polish fixes: phantom base env + docs drift

a383bbf

Merge v8 audit report (zero findings)

7149c96

Merge v9: error-recovery + integration tests + light CI

fbaef50

lucapinello mentioned this pull request Apr 17, 2026

Fix v10 audit findings: tfhub cache + IGV HF fallback + FTO doc + bgzip PATH #21

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audit: 2026-04-17 v10 fresh-install + content-review audit#20

audit: 2026-04-17 v10 fresh-install + content-review audit#20
lucapinello wants to merge 65 commits intomainfrom
audit/2026-04-17-v10-fresh-install-content-review

lucapinello commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucapinello commented Apr 17, 2026

Four findings (all low-to-medium environmental)

Verified (beyond numbers)

One content-review observation worth flagging

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant