Skip to content

Phase 1: Disjoint grouping core implementation#341

Open
s-canchi wants to merge 8 commits intodisjoint-groupingfrom
001-disjoint-grouping
Open

Phase 1: Disjoint grouping core implementation#341
s-canchi wants to merge 8 commits intodisjoint-groupingfrom
001-disjoint-grouping

Conversation

@s-canchi
Copy link
Collaborator

Summary

  • Three independently callable subcommands: disjoint-group, per-group partition, assemble-groups
  • YAML manifest as handoff between steps
  • Works for paired (multi-locus) and unpaired (single-locus) data
  • Works for simulated and real data
  • Disjoint grouping does not degrade clustering accuracy (validated HA and vsearch against standard partition)
  • CDR3 length disjointness validated on simulated and real data
  • Test framework integrated into test/test.py with reference results for all four configs
  • No regressions (partis-test.py --quick and --paired --no-simu pass)

Test plan

  • partis-test.py --quick passes
  • partis-test.py --paired --no-simu passes
  • Disjoint grouping tests pass for paired simu, paired data, unpaired simu, unpaired data
  • Clustering accuracy comparison shows no degradation

Related: #337

s-canchi and others added 8 commits March 11, 2026 17:04
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ouping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-dir arg

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…idation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… (T019)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@s-canchi s-canchi requested a review from psathyrella March 12, 2026 13:28
@s-canchi
Copy link
Collaborator Author

Design choices in Phase 1

  1. Independent subcommands as internal machinery: three separate steps (disjoint-group, partition, assemble-groups) rather than a single command. A disjoint-partition wrapper for standard partis single-command UX is planned for the next phase.

  2. YAML manifest as handoff contract: new artifact type for partis, describing groups, file paths, and metadata between steps.

  3. Per-group FASTA files as intermediate artifacts: writes to disk between steps rather than keeping data in memory, enabling independent parallel execution. I/O metrics to be captured during scale validation to quantify the tradeoff.

  4. Custom merge logic instead of merge_paired_yamls(): merge_yamls() asserts matching partition step counts, which independent runs do not produce. Custom logic extracts only the best partition from each group and reconciles germlines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant