Bug: parseCIF fails on valid mmCIF when _atom_site.auth_seq_id is . for non-polymer atoms
I’m parsing an OpenFold3-generated mmCIF with prody.parseCIF(...), with ProDy v2.6.1 installed via pip with Python 3.12.12, and get:
ValueError: invalid literal for int() with base 10: '.'
The failure occurs in ciffile.py when ProDy reads _atom_site.auth_seq_id and assumes it can always be cast to int.
In the mmCIF file:
- polymer atoms have integer
auth_seq_id
- non-polymer ligand atoms (
LIG0, chain B) have _atom_site.auth_seq_id = '.'
- the file also contains
_pdbx_nonpoly_scheme.auth_seq_num = 1 for that ligand
According to the wwPDB mmCIF dictionary, _atom_site.auth_seq_id is optional and “not necessarily a number,” so this seems to be a parser bug rather than a file-format violation.
Expected behavior:
- ProDy should tolerate missing/non-numeric
auth_seq_id values, especially for non-polymers, instead of unconditionally casting them to int.
Bug:
parseCIFfails on valid mmCIF when_atom_site.auth_seq_idis.for non-polymer atomsI’m parsing an OpenFold3-generated mmCIF with
prody.parseCIF(...), with ProDy v2.6.1 installed viapipwith Python 3.12.12, and get:The failure occurs in
ciffile.pywhen ProDy reads_atom_site.auth_seq_idand assumes it can always be cast toint.In the mmCIF file:
auth_seq_idLIG0, chainB) have_atom_site.auth_seq_id = '.'_pdbx_nonpoly_scheme.auth_seq_num = 1for that ligandAccording to the wwPDB mmCIF dictionary,
_atom_site.auth_seq_idis optional and “not necessarily a number,” so this seems to be a parser bug rather than a file-format violation.Expected behavior:
auth_seq_idvalues, especially for non-polymers, instead of unconditionally casting them toint.