v16: fix drifted oracle specs + broken walkthroughs/ link in notebooks#30
Merged
lucapinello merged 1 commit intochorus-applicationsfrom Apr 21, 2026
Conversation
…ghs/ link in notebooks
Walked every notebook cell output and screenshotted eight shipped HTML
reports. Three doc-drift bugs a new user would trip on:
- comprehensive_oracle_showcase.ipynb / advanced_multi_oracle_analysis.ipynb
cell 0: the "See walkthroughs/" link pointed at the now-renamed
applications/ dir. Retargeted to ../walkthroughs/.
- comprehensive_oracle_showcase.ipynb: "5,930 tracks across 11 modalities"
in three places didn't match what the running notebook prints
(5,731 tracks / 7 assay types). Also added PRO-CAP to the overview
assay list so it matches list_assay_types() output.
- advanced_multi_oracle_analysis.ipynb cell 9: Enformer "context window
of 196 kbp" — the library reports sequence_length=393,216. Corrected
to 393 kbp (matches the comprehensive notebook).
Also flagged (NOT fixed here, needs a focused code PR) an off-by-one in
predict_variant_effect's ref-allele check in chorus/core/base.py:322-328:
extract_sequence('chr1:109274968-109274968') returns 'G' (dbSNP/UCSC
agree), but region_interval[ref2query(109274968, ref_global=True)]
returns 'T' (the base at 109274969). Consequence: every shipped
walkthrough logs a harmless-looking warning and substitutes the ref/alt
one base off. Walkthroughs still agree with published mechanism for
rs12740374 because the signal is coherent across ±1 bp. Details in
audits/2026-04-21_v16_notebook_and_html_audit.md.
Tests: 333 passed / 1 skipped (fast suite, 8m 38s).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Read every cell output of the three library notebooks and screenshotted 8 of the shipped HTML reports via headless Chrome. Found three drifts a new user would trip on, plus one latent code bug I'm not fixing here but flagging for a focused follow-up.
Fixed
applications/link incomprehensive_oracle_showcase.ipynbandadvanced_multi_oracle_analysis.ipynbcell 0. The dir was renamed towalkthroughs/in340f30e; the URL still pointed at the old path. Retargeted to../walkthroughs/.comprehensive_oracle_showcase.ipynb— overview table, section 6 preamble, and the Operation 8 comment all said5,930 tracks across 11 modalities. The running notebook printsLoaded 5731 AlphaGenome tracksandAssay types: 7 ['ATAC', 'CAGE', 'CHIP', 'DNASE', 'PRO_CAP', 'RNA', 'SPLICE_SITES']. Corrected to5,731/7 assay typesand added PRO-CAP to the assay list.advanced_multi_oracle_analysis.ipynbcell 9 (196 kbp→393 kbp) to matchoracle.sequence_lengthand the comprehensive notebook's summary table. Also5.313→5,313(decimal separator).Flagged, not fixed — needs its own PR
predict_variant_effectref-allele check is off by one (chorus/core/base.py:322-328).extract_sequence('chr1:109274968-109274968')returns'G'(matches dbSNP/UCSC);region_interval[ref2query(109274968, ref_global=True)]returns'T'(the base at 109274969). Every shipped walkthrough logs a harmless-looking warning and substitutes the ref/alt one base off. Published mechanism for rs12740374 still reproduces because the regulatory signal is coherent across ±1 bp, which is why this has flown under the radar. Full reproduction trace inaudits/2026-04-21_v16_notebook_and_html_audit.md.Test plan
pytest tests/ --ignore=tests/test_smoke_predict.py -q→ 333 passed / 1 skipped (8m 38s)python -c 'import nbformat; [nbformat.read(p, 4) for p in …]'— all 3 notebooks round-trip cleangrep '5,930\|11 modalities\|196 kbp\|applications/)' examples/notebooks/*.ipynb→ no matches🤖 Generated with Claude Code