updated schema#71
Merged
Merged
Conversation
New commands barcode_correct — standalone Levenshtein correction against a whitelist; resumable, optional inline demux via --run-demux. demux_reads — standalone FASTA/FASTQ export from annotations (demuxed or bulk). generate_whitelist — whitelist-free cell-barcode discovery via knee detection + deletion-neighborhood near-dup merging. qc_metrics — annotation + BAM-level QC, knee plots, boxplots, MultiQC TSVs, HTML report. ~3.9k lines, no PyArrow. assess_model — per-segment accuracy assessment for trained models with a generated report. featurecounts — gene-level count matrix from per-cell BAMs. Pipeline restructure You can run barcode_correct and demux_reads against existing annotations — useful for resuming, re-running with different thresholds, or the whitelist-free flow. annotate_reads and visualize accept preprocessed directories directly. User-configurable bin width in preprocessing Option to split concatenated reads during annotation. Checkpoint/resume across annotate, BC correct, and demux. Optional chunk cleanup. annotations_valid.parquet auto-removed once the corrected file exists. Barcode correction handles UMI-less protocols and arbitrary barcode-type combos. All FASTA/FASTQ outputs gzipped. Model / training REG and HYB paths dropped. CRF-only. 10x3p_sc_ont_013 is the new default. seq_orders and training_params moved from TSV → YAML with a cleaner schema. Dynamic batch sizing now accounts for all layers, not just conv. Training artifacts versioned and folder naming cleaned up. QC qc_metrics.py computes and visualizes multiple QC plots at various levels Dedup / BAM UMI dedup significantly faster. Minor tweaks to split_bam. Packaging & CI setup.py → pyproject.toml. Moved to setuptools-scm for versioning. Fixed the upstream CI breakage: tag_regex + git_describe_command so legacy non-PEP440 tags like v0.2.1_tf2.15.0 are stripped to 0.2.1. Added a Docker publish workflow. pytest now runs on push/PR to dev and annot_demux_refactor. New unit tests for checkpoint chunk size, _version, and the seq_orders YAML refactor. Integration tests expanded. Added container_runtime.py helper. Docs Quarto site reorganized into per-stage pages: preprocessing, annotation, barcode/demux, align/dedup, QC, split BAM, visualization, featurecounts. New model-assessment guide; model-training and read-simulation docs rewritten. Quick start, usage, install, resource-requirements pages refreshed.
syncing with master branch
Updated schema
updated schema
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.