Skip to content

Enable FASTA reference inputs + fail-fast contig validation.#170

Merged
Kudostoy0u merged 6 commits intoepifluidlab:mainfrom
edawson:edawson/mismatched-contig-early-exit
May 4, 2026
Merged

Enable FASTA reference inputs + fail-fast contig validation.#170
Kudostoy0u merged 6 commits intoepifluidlab:mainfrom
edawson:edawson/mismatched-contig-early-exit

Conversation

@edawson
Copy link
Copy Markdown

@edawson edawson commented Apr 28, 2026

Howdy,

My apologies that this got so long, but I hope it's helpful.

This PR adds a ReferenceWrapper class that enables FASTA reference genome inputs and provides a common API for both 2-bit and FASTA inputs. This class then replaces any calls to py2bit in the code. A check on the contigs present in the BAM/CRAM/SAM/Frag(.gz) and the contigs in the reference genome is made immediately after argument parsing, preventing a particular error I was getting often where there was a mismatch that often did not report the error sufficiently for me to debug.

New Features:

  • FASTA reference support, in addition to 2-bit
  • Single interface for FASTA / 2-bit files
  • More aggressive contig checking upfront to surface mismatched refs/inputs sooner

Other changes:

  • Freezes dependencies for security purposes. This somewhat assumes code is then run in a venv, a conda env or a Docker container as per modern Python practices.
  • Refactors the gen_kmers and reverse_complement functions from end motifs / breakpoint motifs to utils, where they exist only once in the package.

Testing:

  • I did not modify the tests.
  • All tests pass using pytest tests/

I have a followup PR that refactors the input_file as well, which is another huge chunk of code (as I had to refactor some significant portions of the utils module to break a dependency cycles). Happy to help answer questions and thank y'all for FinaleToolkit!

Best,
Eric

Eric Dawson added 2 commits April 28, 2026 17:17
…d enable FASTA inputs for references. Adds a logging wrapper with formatting for module-specific logging. Adds validation logic immediately after input parsing to make sure that invalid references fail before any analysis gets run.
@edawson edawson mentioned this pull request Apr 28, 2026
@Kudostoy0u Kudostoy0u self-assigned this Apr 29, 2026
@Kudostoy0u Kudostoy0u self-requested a review April 29, 2026 02:22
@Kudostoy0u Kudostoy0u removed their assignment Apr 29, 2026
Eric Dawson and others added 2 commits April 29, 2026 08:53
@edawson
Copy link
Copy Markdown
Author

edawson commented Apr 30, 2026

Howdy @Kudostoy0u - it would be great to add a test to make sure the FASTA file wrapper is working as expected. I can likely do so if needed, but I assume y'all have one for the 2-bit version and it seems ideal to simply drop the FASTA in its place and make sure it works as expected.

Filter parsed CLI args against target function signatures and add a coverage smoke test.
@Kudostoy0u
Copy link
Copy Markdown
Collaborator

Hi @edawson! Good point, I'll add a small FASTA fixture and a regression test with it to see if the new FASTA reference support produces same results as existing .2bit test coverage.

Your PRs touch a large amount of code (thanks for your contributions!), so I’m also going to add another maintainer as a reviewer for a second pass.

Compare DELFI outputs from the existing 2-bit fixture and a matching FASTA fixture to cover the new reference backend.
Copy link
Copy Markdown
Collaborator

@Kudostoy0u Kudostoy0u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, the test suite and CI are passing, the new FASTA reference path is now covered by a regression test.

Python 3.9 dropped since it's EOL and wasn't working well with frozen dependency versions.

@Kudostoy0u Kudostoy0u merged commit 375a4d6 into epifluidlab:main May 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants