Enable FASTA reference inputs + fail-fast contig validation.#170
Conversation
…d enable FASTA inputs for references. Adds a logging wrapper with formatting for module-specific logging. Adds validation logic immediately after input parsing to make sure that invalid references fail before any analysis gets run.
Solve the Python version–dependency mismatch
|
Howdy @Kudostoy0u - it would be great to add a test to make sure the FASTA file wrapper is working as expected. I can likely do so if needed, but I assume y'all have one for the 2-bit version and it seems ideal to simply drop the FASTA in its place and make sure it works as expected. |
Filter parsed CLI args against target function signatures and add a coverage smoke test.
|
Hi @edawson! Good point, I'll add a small FASTA fixture and a regression test with it to see if the new FASTA reference support produces same results as existing .2bit test coverage. Your PRs touch a large amount of code (thanks for your contributions!), so I’m also going to add another maintainer as a reviewer for a second pass. |
Compare DELFI outputs from the existing 2-bit fixture and a matching FASTA fixture to cover the new reference backend.
Kudostoy0u
left a comment
There was a problem hiding this comment.
Tested locally, the test suite and CI are passing, the new FASTA reference path is now covered by a regression test.
Python 3.9 dropped since it's EOL and wasn't working well with frozen dependency versions.
Howdy,
My apologies that this got so long, but I hope it's helpful.
This PR adds a ReferenceWrapper class that enables FASTA reference genome inputs and provides a common API for both 2-bit and FASTA inputs. This class then replaces any calls to py2bit in the code. A check on the contigs present in the BAM/CRAM/SAM/Frag(.gz) and the contigs in the reference genome is made immediately after argument parsing, preventing a particular error I was getting often where there was a mismatch that often did not report the error sufficiently for me to debug.
New Features:
Other changes:
gen_kmersandreverse_complementfunctions from end motifs / breakpoint motifs toutils, where they exist only once in the package.Testing:
pytest tests/I have a followup PR that refactors the input_file as well, which is another huge chunk of code (as I had to refactor some significant portions of the utils module to break a dependency cycles). Happy to help answer questions and thank y'all for FinaleToolkit!
Best,
Eric