Conversation
…private_data Differentiate location fields for private/public
…tion Improve documentation
There was a problem hiding this comment.
Nice!
I have a concern about keeping that script in that repo. We strictly limited taxonomy in genome_uploader and docs to NCBI type. That script has GTDB parser.. why? are you sure it will work for GTDB? why we use converter from gtdb_to_ncbi.py from GTDB-Tk repo then? I'm a bit confused.
I think there are 2 ways:
- limit to NCBI and remove any parsing/mentioning of GTDB from repo. Add another script that will parse both into toolkit, for example.
- accept GTDB and convert on fly (that is not possible I guess)
And maybe add cli? (ability to have input file and run not only inside repo)
Functions ena.query_scientific_name and ena.query_taxid are used only for taxon_finder.py. I think we should move those from ena to script directly.
…hub.com/EBI-Metagenomics/genome_uploader into feature/taxonomy_submission_improvement
|
Hey @KateSakharova thanks for your comments. Thanks for catching the GTDB comments, they are remnants of when the script used to have the GTDB taxonomy converter inside, before exporting it to the GGP. I should have removed all instances now.
I might add this one, since I had to write a wrapper anyway to use it. But I'd like to write a test too if so... I'll add it to my todo list if I have15 minutes spare.
Not sure, I thought that since they query ena-api, if we want to re-use them one day it might be better to keep them there context-wise? |
KateSakharova
left a comment
There was a problem hiding this comment.
I've added some changes and minimal test because I had to modify script for myself doing course materials anyway.
I think we can merge it now. I want to unblock you on that PR. But the way we find species for submission sounds weird to me, I didn't know uploader works like that. We will probably need to discuss it in the future for better improvement.
You can add more tests if you want, it should be very easy now. I added just those I have.
The taxonomy extraction script has been made into an independent module. A few taxonomic rules have been added based on use cases I had previously found in our genomes and the NCBI taxonomy
This also adds to main documentation refinements that have already been reviewed for the dev branch.