Skip to content

Bug report: PODP mode - ValueError in _resolve_genbank_accession() When No RefSeq Accession Exists #307

@liannette

Description

@liannette

Bug Report

When resolving the GenBank accession of genome assemblies to the RefSeq accession using the _resolve_genbank_accession() function in podp_antismash_downloader.py, a ValueError occurs if no RefSeq accession exists for the given assembly. This leads to crashes when processing these assemblies.

Steps to Reproduce

  1. Use the NCBI Datasets API to fetch the RefSeq assembly ID for a given GenBank assembly ID.
  2. If a RefSeq accession is available, the function operates as expected. Example:
    {
      "assembly_revisions": [
        {
          "genbank_accession": "GCA_000175835.1",
          "refseq_accession": "GCF_000175835.1",
          "assembly_name": "ASM17583v1",
          "assembly_level": "contig",
          "release_date": "2009-12-15"
        }
      ],
      "total_count": 1
    }
  3. However, when the API response does not include a refseq_accession, the function fails. Example:
    {
      "assembly_revisions": [
        {
          "genbank_accession": "GCA_003326215.1",
          "assembly_name": "ASM332621v1",
          "assembly_level": "contig",
          "release_date": "2018-07-18",
          "sequencing_technology": "Illumina MiSeq"
        }
      ],
      "total_count": 1
    }
  4. This results in the following error:
    File ~/coding/NPLinker_workshop_2025/nplinker/src/nplinker/genomics/antismash/podp_antismash_downloader.py:284, in _resolve_genbank_accession(genbank_id)
        282     if resp.status_code == httpx.codes.OK:
        283         data = resp.json()
    --> 284         latest_entry = max(
        285             (entry for entry in data["assembly_revisions"] if "refseq_accession" in entry),
        286             key=lambda x: x["release_date"],
        287         )
        288         refseq_id = latest_entry["refseq_accession"]
        289 except httpx.ReadTimeout:
    
    ValueError: max() arg is an empty sequence
    

Suggested Fix
Returning an empty string when no RefSeq accession is found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions