-
Notifications
You must be signed in to change notification settings - Fork 749
Implementing gemmi-based mmcif reader (with easy extension to PDB/PDBx and mmJSON)
#4712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
aa2a88f
218cf43
7f78e02
6682d6e
817f3a0
3cc8c80
f1bf325
d21c220
2a1be15
77645e6
91e6942
9a0c086
9c731df
8b40ec7
bdcbd73
401a4d3
9c336bd
def88e4
cabfd37
4c9d930
184491a
3de8565
ca6ebbb
45077ad
9a1a59a
27c10d6
950cfcf
8d1a8b5
ef29338
caca17e
f0e49cc
b7ada7c
e80632c
10f3124
98353fe
35fa187
e68fcce
263e9f1
ba47d53
9ffb6f2
fcfc6c0
0de720e
236b286
b562115
816b23f
92ae164
88c64a3
59b7e29
f2c23c8
776676e
71e60f4
36b7125
b058941
ef30fa7
95572c1
a8a9436
6706bbe
8cf9da4
f13156b
dda981c
ebdf849
47043f6
9770d7b
fd7f70d
1493056
3d7fbb9
9b9286e
b8f3c04
0f38a2d
b915aab
0d61248
34d76ca
b242aa5
4fc3a78
14fa756
e3a9a1f
d492b4e
1880e4a
927d7a0
ad0f0be
e03c3e5
4d79205
32d7cf9
0df8c3a
05c6ea1
88dab79
a82fe52
e3f1714
db46016
32cd103
805089e
55c3dbb
cd201d0
81f0b5b
22d1cca
d1ba434
a03b56f
bd4c255
3d61dc5
aed9b54
53c51f4
3e0324c
205c910
f9f7912
9a90316
1c5a549
dfc10e6
aae46c8
9cf4027
522e125
0b0ec81
b3d7c1c
801d85f
bdd070e
6b2c6c6
1a7f607
157c365
1d01a3f
2020484
d362f91
959d78b
15441f0
e2e097f
bc89c0b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -21,8 +21,8 @@ inputs: | |||||
| default: 'codecov' | ||||||
| cython: | ||||||
| default: 'cython' | ||||||
| filelock: | ||||||
| default: 'filelock' | ||||||
| fasteners: | ||||||
| default: 'fasteners' | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add optional deps down in the optional deps section below.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's on me not merging develop-2 |
||||||
| griddataformats: | ||||||
| default: 'griddataformats' | ||||||
| gsd: | ||||||
|
|
@@ -60,6 +60,8 @@ inputs: | |||||
| default: 'dask' | ||||||
| distopia: | ||||||
| default: 'distopia>=0.4.0' | ||||||
| gemmi: | ||||||
| default: 'gemmi' | ||||||
| h5py: | ||||||
| default: 'h5py>=2.10' | ||||||
| hole2: | ||||||
|
|
@@ -138,6 +140,7 @@ runs: | |||||
| ${{ inputs.dask }} | ||||||
| ${{ inputs.distopia }} | ||||||
| ${{ inputs.gsd }} | ||||||
| ${{ inputs.gemmi }} | ||||||
| ${{ inputs.h5py }} | ||||||
| ${{ inputs.hole2 }} | ||||||
| ${{ inputs.imdclient }} | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| # -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
| # vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
| # | ||
| """ | ||
| MMCIF structure files in MDAnalysis --- :mod:`MDAnalysis.coordinates.MMCIF` | ||
| ========================================================================== | ||
|
|
||
| MDAnalysis reads coordinates from MMCIF (macromolecular Crystallographic Information File) files, also known as PDBx/mmCIF format, | ||
| using the ``gemmi`` library as a backend. MMCIF is a more modern and flexible alternative to the PDB format, | ||
| capable of storing detailed structural and experimental data about biological macromolecules. | ||
|
|
||
| MMCIF files use a structured, tabular format with key-value pairs to store both coordinate and atom information. | ||
| The format supports multiple models/frames, though this implementation currently only reads the first model | ||
| and provides warning messages for multi-model files. | ||
|
|
||
| Basic usage | ||
| ----------- | ||
|
|
||
| Reading an MMCIF file is straightforward: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| import MDAnalysis as mda | ||
| u = mda.Universe("structure.cif") | ||
|
|
||
|
|
||
| The reader will automatically detect if the structure contains placeholder unit cell information | ||
| (usually it's the case for cryoEM structures, and cell parameters are (1, 1, 1, 90, 90, 90)) | ||
| and set dimensions to None in that case. | ||
|
|
||
| Capabilities | ||
| ------------ | ||
|
|
||
| The MMCIF reader implementation uses the gemmi library to parse files and extract coordinates | ||
| and unit cell information. Currently only reading capability is supported, with the following | ||
| features: | ||
|
|
||
| - Single frame/model reading | ||
| - Unit cell dimensions detection | ||
| - Support for compressed .cif.gz files | ||
| - Automatic handling of placeholder unit cells for cryoEM structures | ||
|
|
||
| Examples | ||
| -------- | ||
|
|
||
| Basic structure loading:: | ||
|
|
||
| .. code-block:: python | ||
yuxuanzhuang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Load structure from MMCIF | ||
| u = mda.Universe("structure.cif") | ||
|
|
||
| # or from cif.gz file | ||
| u = mda.Universe("structure.cif.gz") | ||
|
|
||
| Classes | ||
| ------- | ||
|
|
||
| .. autoclass:: MMCIFReader | ||
| :members: | ||
| :inherited-members: | ||
|
|
||
| See Also | ||
yuxuanzhuang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| -------- | ||
| - wwPDB MMCIF Resources: <http://mmcif.wwpdb.org>_ | ||
| - Gemmi library documentation: <https://gemmi.readthedocs.io>_ | ||
|
|
||
| .. versionadded:: 2.9.0 | ||
| """ | ||
|
|
||
| import logging | ||
| import warnings | ||
|
|
||
| import numpy as np | ||
|
|
||
| from . import base | ||
| from ..lib import util | ||
|
|
||
| try: | ||
| import gemmi | ||
|
|
||
| HAS_GEMMI = True | ||
| except ImportError: | ||
| HAS_GEMMI = False | ||
|
|
||
| logger = logging.getLogger("MDAnalysis.coordinates.MMCIF") | ||
|
|
||
|
|
||
| def get_coordinates(model: "gemmi.Model") -> np.ndarray: | ||
| """Get coordinates of all atoms in the `gemmi.Model` object. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model | ||
| input `gemmi.Model`, e.g. `gemmi.read_structure('file.cif')[0]` | ||
|
|
||
| Returns | ||
| ------- | ||
| np.ndarray, shape [n, 3], where `n` is the number of atoms in the structure. | ||
| """ | ||
| return np.array( | ||
| [[*at.pos.tolist()] for chain in model for res in chain for at in res] | ||
| ) | ||
|
|
||
|
|
||
| class MMCIFReader(base.SingleFrameReaderBase): | ||
| """Reads from an MMCIF file using ``gemmi`` library as a backend. | ||
|
|
||
| Notes | ||
| ----- | ||
|
|
||
| If the structure represents an ensemble, only the first structure in the ensemble | ||
| is read here (and a warning is thrown). Also, if the structure has a placeholder "CRYST1" | ||
| record (1, 1, 1, 90, 90, 90), it's set to ``None`` instead. | ||
|
|
||
| .. versionadded:: 2.9.0 | ||
| """ | ||
|
|
||
| format = ["cif", "cif.gz", "mmcif", "mmcif.gz"] | ||
| units = {"time": None, "length": "Angstrom"} | ||
|
|
||
| def _read_first_frame(self): | ||
| structure = self._get_structure() | ||
| cell_dims = np.array( | ||
| [ | ||
| getattr(structure.cell, name) | ||
| for name in ("a", "b", "c", "alpha", "beta", "gamma") | ||
| ] | ||
| ) | ||
| if len(structure) > 1: | ||
| warnings.warn( | ||
| f"File {self.filename} has {len(structure)=} models, but only the first one will be read" | ||
| ) | ||
|
|
||
| model = structure[0] | ||
| coords = get_coordinates(model) | ||
| self.n_atoms = len(coords) | ||
| self.ts = self._Timestep.from_coordinates(coords, **self._ts_kwargs) | ||
| if np.allclose(cell_dims, np.array([1.0, 1.0, 1.0, 90.0, 90.0, 90.0])): | ||
| warnings.warn( | ||
| "1 A^3 CRYST1 record," | ||
| " this is usually a placeholder." | ||
| " Unit cell dimensions will be set to None." | ||
| ) | ||
| self.ts.dimensions = None | ||
| else: | ||
| self.ts.dimensions = cell_dims | ||
| self.ts.frame = 0 | ||
|
|
||
| def _get_structure(self): | ||
| # This method exists because of some lacking methods in the gemmi Python API. | ||
| # within gemmi in C++, one can call `read_structure` and in-memory, string, and filepath | ||
| # arguments will all be accepted: | ||
| # https://github.com/project-gemmi/gemmi/blob/4416e298f204b7b57bf5b3051d7efd4fe02957cf/include/gemmi/mmread.hpp#L86 | ||
|
|
||
| # However, for MDA to similarly accept common input types like streams (open File-like objs and StringIO objs) | ||
| # as well as pathlib.Path() objects, we have to use the Python API methods available currently (as of 0.7.3) | ||
| # with a string as a common target for all input types | ||
| # For this, we call gemmi.cif.read_string (https://gemmi.readthedocs.io/en/latest/cif.html#reading) to handle CIF | ||
| # strings and gemmi.read_pdb to handle PDB strings (no one method can handle both formats currently Py-side) | ||
|
|
||
| # openany() is called instead of passing file paths (when available) differently from streams | ||
| # even though reading the file into a string is less efficient, this is easier to maintain | ||
|
|
||
| # if the gemmi Python API is extended, this method can be simplified/removed and replaced with something like | ||
| # gemmi.read_structure | ||
|
|
||
| with util.openany(self.filename) as f: | ||
| content_as_str = f.read() | ||
| try: | ||
| # String -> Doc -> Block -> Structure | ||
| # making Structure from first Block in Document as is done internally in gemmi: | ||
| # https://github.com/project-gemmi/gemmi/blob/4416e298f204b7b57bf5b3051d7efd4fe02957cf/include/gemmi/mmcif.hpp#L32 | ||
| return gemmi.make_structure_from_block( | ||
| gemmi.cif.read_string(content_as_str)[0] | ||
| ) | ||
| except ValueError as e: | ||
| try: | ||
| return gemmi.read_pdb_string(content_as_str) | ||
| except ValueError: | ||
| raise e | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's on me not merging develop-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is not fixed yet?