-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Bug Report
When resolving the GenBank accession of genome assemblies to the RefSeq accession using the _resolve_genbank_accession() function in podp_antismash_downloader.py, a ValueError occurs if no RefSeq accession exists for the given assembly. This leads to crashes when processing these assemblies.
Steps to Reproduce
- Use the NCBI Datasets API to fetch the RefSeq assembly ID for a given GenBank assembly ID.
- If a RefSeq accession is available, the function operates as expected. Example:
{ "assembly_revisions": [ { "genbank_accession": "GCA_000175835.1", "refseq_accession": "GCF_000175835.1", "assembly_name": "ASM17583v1", "assembly_level": "contig", "release_date": "2009-12-15" } ], "total_count": 1 } - However, when the API response does not include a
refseq_accession, the function fails. Example:{ "assembly_revisions": [ { "genbank_accession": "GCA_003326215.1", "assembly_name": "ASM332621v1", "assembly_level": "contig", "release_date": "2018-07-18", "sequencing_technology": "Illumina MiSeq" } ], "total_count": 1 } - This results in the following error:
File ~/coding/NPLinker_workshop_2025/nplinker/src/nplinker/genomics/antismash/podp_antismash_downloader.py:284, in _resolve_genbank_accession(genbank_id) 282 if resp.status_code == httpx.codes.OK: 283 data = resp.json() --> 284 latest_entry = max( 285 (entry for entry in data["assembly_revisions"] if "refseq_accession" in entry), 286 key=lambda x: x["release_date"], 287 ) 288 refseq_id = latest_entry["refseq_accession"] 289 except httpx.ReadTimeout: ValueError: max() arg is an empty sequence
Suggested Fix
Returning an empty string when no RefSeq accession is found.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog