Skip to content

updated schema#71

Merged
AyushSemwal merged 5 commits into
mainfrom
dev
Apr 17, 2026
Merged

updated schema#71
AyushSemwal merged 5 commits into
mainfrom
dev

Conversation

@AyushSemwal
Copy link
Copy Markdown
Member

No description provided.

New commands

barcode_correct — standalone Levenshtein correction against a whitelist; resumable, optional inline demux via
--run-demux.
demux_reads — standalone FASTA/FASTQ export from annotations (demuxed or bulk).
generate_whitelist — whitelist-free cell-barcode discovery via knee detection + deletion-neighborhood near-dup merging.
qc_metrics — annotation + BAM-level QC, knee plots, boxplots, MultiQC TSVs, HTML report. ~3.9k lines, no PyArrow.
assess_model — per-segment accuracy assessment for trained models with a generated report.
featurecounts — gene-level count matrix from per-cell BAMs.
Pipeline restructure

You can run barcode_correct and demux_reads against existing annotations — useful for resuming, re-running with different thresholds, or the whitelist-free flow.
annotate_reads and visualize accept preprocessed directories directly.
User-configurable bin width in preprocessing
Option to split concatenated reads during annotation.
Checkpoint/resume across annotate, BC correct, and demux. Optional chunk cleanup. annotations_valid.parquet
auto-removed once the corrected file exists.
Barcode correction handles UMI-less protocols and arbitrary barcode-type combos. All FASTA/FASTQ outputs gzipped.
Model / training

REG and HYB paths dropped. CRF-only. 10x3p_sc_ont_013 is the new default.
seq_orders and training_params moved from TSV → YAML with a cleaner schema.
Dynamic batch sizing now accounts for all layers, not just conv.
Training artifacts versioned and folder naming cleaned up.
QC

qc_metrics.py computes and visualizes multiple QC plots at various levels
Dedup / BAM

UMI dedup significantly faster.
Minor tweaks to split_bam.
Packaging & CI

setup.py → pyproject.toml. Moved to setuptools-scm for versioning.
Fixed the upstream CI breakage: tag_regex + git_describe_command so legacy non-PEP440 tags like v0.2.1_tf2.15.0 are
stripped to 0.2.1.
Added a Docker publish workflow.
pytest now runs on push/PR to dev and annot_demux_refactor.
New unit tests for checkpoint chunk size, _version, and the seq_orders YAML refactor. Integration tests expanded.
Added container_runtime.py helper.
Docs

Quarto site reorganized into per-stage pages: preprocessing, annotation, barcode/demux, align/dedup, QC, split BAM,
visualization, featurecounts.
New model-assessment guide; model-training and read-simulation docs rewritten.
Quick start, usage, install, resource-requirements pages refreshed.
syncing with master branch
@AyushSemwal AyushSemwal merged commit 263f51d into main Apr 17, 2026
4 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant