Skip to content

v16: fix drifted oracle specs + broken walkthroughs/ link in notebooks#30

Merged
lucapinello merged 1 commit intochorus-applicationsfrom
audit/2026-04-21-v16-deep-notebook-html-pass
Apr 21, 2026
Merged

v16: fix drifted oracle specs + broken walkthroughs/ link in notebooks#30
lucapinello merged 1 commit intochorus-applicationsfrom
audit/2026-04-21-v16-deep-notebook-html-pass

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Summary

Read every cell output of the three library notebooks and screenshotted 8 of the shipped HTML reports via headless Chrome. Found three drifts a new user would trip on, plus one latent code bug I'm not fixing here but flagging for a focused follow-up.

Fixed

  • Broken applications/ link in comprehensive_oracle_showcase.ipynb and advanced_multi_oracle_analysis.ipynb cell 0. The dir was renamed to walkthroughs/ in 340f30e; the URL still pointed at the old path. Retargeted to ../walkthroughs/.
  • AlphaGenome spec drift in comprehensive_oracle_showcase.ipynb — overview table, section 6 preamble, and the Operation 8 comment all said 5,930 tracks across 11 modalities. The running notebook prints Loaded 5731 AlphaGenome tracks and Assay types: 7 ['ATAC', 'CAGE', 'CHIP', 'DNASE', 'PRO_CAP', 'RNA', 'SPLICE_SITES']. Corrected to 5,731 / 7 assay types and added PRO-CAP to the assay list.
  • Enformer context-window typo in advanced_multi_oracle_analysis.ipynb cell 9 (196 kbp393 kbp) to match oracle.sequence_length and the comprehensive notebook's summary table. Also 5.3135,313 (decimal separator).

Flagged, not fixed — needs its own PR

predict_variant_effect ref-allele check is off by one (chorus/core/base.py:322-328). extract_sequence('chr1:109274968-109274968') returns 'G' (matches dbSNP/UCSC); region_interval[ref2query(109274968, ref_global=True)] returns 'T' (the base at 109274969). Every shipped walkthrough logs a harmless-looking warning and substitutes the ref/alt one base off. Published mechanism for rs12740374 still reproduces because the regulatory signal is coherent across ±1 bp, which is why this has flown under the radar. Full reproduction trace in audits/2026-04-21_v16_notebook_and_html_audit.md.

Test plan

  • pytest tests/ --ignore=tests/test_smoke_predict.py -q → 333 passed / 1 skipped (8m 38s)
  • python -c 'import nbformat; [nbformat.read(p, 4) for p in …]' — all 3 notebooks round-trip clean
  • grep '5,930\|11 modalities\|196 kbp\|applications/)' examples/notebooks/*.ipynb → no matches
  • Screenshotted 8 HTML reports at 1400×3000; verified glossary / formula chips / tables render correctly

🤖 Generated with Claude Code

…ghs/ link in notebooks

Walked every notebook cell output and screenshotted eight shipped HTML
reports. Three doc-drift bugs a new user would trip on:

- comprehensive_oracle_showcase.ipynb / advanced_multi_oracle_analysis.ipynb
  cell 0: the "See walkthroughs/" link pointed at the now-renamed
  applications/ dir. Retargeted to ../walkthroughs/.
- comprehensive_oracle_showcase.ipynb: "5,930 tracks across 11 modalities"
  in three places didn't match what the running notebook prints
  (5,731 tracks / 7 assay types). Also added PRO-CAP to the overview
  assay list so it matches list_assay_types() output.
- advanced_multi_oracle_analysis.ipynb cell 9: Enformer "context window
  of 196 kbp" — the library reports sequence_length=393,216. Corrected
  to 393 kbp (matches the comprehensive notebook).

Also flagged (NOT fixed here, needs a focused code PR) an off-by-one in
predict_variant_effect's ref-allele check in chorus/core/base.py:322-328:
extract_sequence('chr1:109274968-109274968') returns 'G' (dbSNP/UCSC
agree), but region_interval[ref2query(109274968, ref_global=True)]
returns 'T' (the base at 109274969). Consequence: every shipped
walkthrough logs a harmless-looking warning and substitutes the ref/alt
one base off. Walkthroughs still agree with published mechanism for
rs12740374 because the signal is coherent across ±1 bp. Details in
audits/2026-04-21_v16_notebook_and_html_audit.md.

Tests: 333 passed / 1 skipped (fast suite, 8m 38s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lucapinello lucapinello merged commit 016413d into chorus-applications Apr 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant