Skip to content

Added a fastq file parser#5

Merged
dawnandrew100 merged 15 commits intolignum-vitae:mainfrom
fragilefort:fastq-parser
Sep 13, 2025
Merged

Added a fastq file parser#5
dawnandrew100 merged 15 commits intolignum-vitae:mainfrom
fragilefort:fastq-parser

Conversation

@fragilefort
Copy link
Copy Markdown
Contributor

Added a fastq file parser. Let me know if anything need documentation or fixes. It can work with both files and multiline sequences. Note: motif.py has some issues and importing Dna class in fastq.py will case issues so before testing the parser, comment out lines in the init.py in analysis which imports motif.py until this file is fixed.

@dawnandrew100
Copy link
Copy Markdown
Member

So, it looks like the reason why the tests aren't passing is because of the numpy import. I think that's more of an issue with the action runner itself rather than the implementation, so I'm going to see what the minimum number of changes I need to make are to make that work.

I think I just need to add numpy to the dependencies of the pyproject.toml

The issues in motif.py might be the nested f-string, which isn't supported prior to Python 3.12
If you're using a version before that, it might be causing you trouble
For that, I think I just need to have different quotation marks for the nesting to work, so we'll see

dawnandrew100 and others added 6 commits September 4, 2025 19:25
Just trying to make the tests pass again
I'm making the fewest number of changes possible to make this happen and also adding python 3.10 and 3.11 back to the tests
github action runner says that versions of python before 3.12 aren't allowed
Everything works fine when I use tox locally to test out different python versions. I'm going to try one more time to fix this and if it doesn't work I'll fix this after the PR is done
@dawnandrew100
Copy link
Copy Markdown
Member

Okay, so the problem was that GitHub didn't acknowledge that the pyproject.toml said requires-python = ">=3.10" instead of requires-python = ">=3.12", but we did some voodoo, and now that is fixed

Will look at fastq.py properly in the morning
A few things I can see from a quick glance that should be changed

  1. Need to add a testing suite to the tests folder for the new fastq functionality
  2. Read -> FastqRecord for naming consistency with fasta.py
  3. Improve homogeneity between fasta.py and fastq.py (which honestly will most likely be me updating fasta.py where I see fit)
  4. Take out the gc content method from Read (no need for it here)

If you want to tackle 1, 2, and 4, that'd be great !

@fragilefort
Copy link
Copy Markdown
Contributor Author

Great, whatever it takes to convince him. I will complete 1,2 and 4 today and solve any additional comments with the implementation after review

@fragilefort
Copy link
Copy Markdown
Contributor Author

I did the mentioned edits on the fastq parser, and added a test suite and passed with pytest, also tested this time with python 3.13. For some reason, the GitHub test actions complains again about numpy :(

@fragilefort
Copy link
Copy Markdown
Contributor Author

So here is the thing, fetching didn't include the changes you did in this pull request so I did the changes pushed and it failed. Then I made the reapply the changes you made earlier today and it only passes 2 builds, not sure where to correct it this time.

@dawnandrew100
Copy link
Copy Markdown
Member

Will have to look over this tomorrow when I'm more awake, but basically, the changes that I made were to make fastq and fasta more uniform in their implementation

Obviously, not everything can be identical because they are two different file formats, and some of the attributes (like phred scores) are unique to fastq files, but the effort was to get close enough so that effectively the same methods could be used on either fasta or fastq

I also converted the fasta parser to a class that has an iterator (but also left the plain function just in case someone might find that useful), and I also added a plain fastq_parser function to fastq.py for the sake of file uniformity

@dawnandrew100
Copy link
Copy Markdown
Member

I'll also leave a comment here to remind myself to make another pull request after this one is merged that will officially bring back support for 3.10 and 3.11 testing through github actions

@dawnandrew100
Copy link
Copy Markdown
Member

This comment is a reminder to myself that I need to remove pytest from the project dependencies and add it to the optional dev dependencies

Something like

[project.optional-dependencies]
dev = ["pytest"]

And anyone working on the project can either just download pytest directly or pip install like this pip install biobase[dev]

Copy link
Copy Markdown
Member

@dawnandrew100 dawnandrew100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me!

@dawnandrew100 dawnandrew100 merged commit 1e4d117 into lignum-vitae:main Sep 13, 2025
2 checks passed
@fragilefort fragilefort deleted the fastq-parser branch October 4, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants