Releases: RosettaCommons/atomworks
Releases · RosettaCommons/atomworks
v2.2.0
2.2.0 (2025-12-19)
Features
- add automated mmJSON parsing test and refactor buffer file type inference in
io_utils.py. (5db3631) - Add example CIF/JSON data, update dependencies, and modify I/O parsing utilities. (ba00b49)
- Add mmjson file type support, update file type inference, and introduce a parsing verification script with example data. (0a995dc)
v2.1.2
v2.1.1
v2.1.0
v2.0.0
Performance
- Parser 2-3x faster: Significant optimizations to structure parsing, especially for symmetric assemblies
- Cache loading 3-5x faster: Improved pickle/gzip cache handling with 2-level directory sharding for better filesystem performance
- Vectorized annotations:
add_pn_unit_iid_annotation()now uses boolean masks instead of expensive subarray operations (10-100x speedup on symmetric assemblies)
Breaking Changes
Dataset Module Restructuring
The dataset module has been restructured to align with TorchVision/TorchAudio and HuggingFace conventions, using a dataset/loader pattern:
- Removed
dataset.datasetnesting: Datasets are now flat; access data directly from the dataset object - MetadataRowParser deprecated: The
StructuralDatasetWrapper+dataset_parserpattern is replaced with aloaderparameter directly on datasets (backwards-compatible but deprecated)
Migration example:
# Old (deprecated)
from atomworks.ml.datasets import StructuralDatasetWrapper, PandasDataset
from atomworks.ml.datasets.parsers import PNUnitsDFParser
dataset = StructuralDatasetWrapper(
dataset=PandasDataset(data="df.parquet"),
dataset_parser=PNUnitsDFParser(...)
)
# New
from atomworks.ml.datasets import PandasDataset
from atomworks.ml.datasets.loaders import create_base_loader
dataset = PandasDataset(
data="df.parquet",
loader=create_base_loader(
example_id_colname="example_id",
path_colname="path",
)
)Parser Changes
- CCD mirror path validation:
ccd_mirror_pathnow raisesFileNotFoundErrorif the path doesn't exist. PassNoneexplicitly to use Biotite's bundled CCD build_assembly="_spoof"removed: Use"all"instead (raises deprecation warning)convert_mse_to_metdefault changed: NowTrueby default (wasFalse)STANDARD_PARSER_ARGSrenamed: WasDEFAULT_PARSE_KWARGS; now uses tuples instead of lists for hashability
Environment Changes
- Removed automatic
.envloading:dotenvis no longer auto-loaded on import. Callload_dotenv()explicitly if needed:from dotenv import load_dotenv load_dotenv()
Removed Exports
monkey_patch_atomarrayremoved from top-level exports. Usefrom atomworks.biotite_patch import monkey_patch_biotiteinstead
Added
New Modules
atomworks.ml.conditions- Unified conditioning management for model trainingatomworks.ml.preprocessing.msa- MSA preprocessing (organize, filter, generate)atomworks.ml.executables- External executable management (hbplus, hhfilter, mmseqs2, x3dna)atomworks.ml.transforms.design_task- Design task transformsatomworks.ml.transforms.mask_generator- Mask generation for trainingatomworks.ml.utils.condition- Condition utilitiesatomworks.io.utils.compression- Compression utilities (zstd support)
New Dataset Classes
FileDataset- Each file is one example (extracted from old monolithic datasets.py)PandasDataset- DataFrame-backed dataset with loader support
New Loader Functions
create_base_loader()- Standard CIF loadingcreate_loader_with_query_pn_units()- Loading with PN unit queriescreate_loader_with_interfaces_and_pn_units_to_score()- Interface scoring loader
New Constants
PROTEIN_BACKBONE_ATOM_NAMES- Backbone atoms including OXTRNA_BACKBONE_ATOM_NAMES- Sugar-phosphate + 2' hydroxyl atomsDNA_BACKBONE_ATOM_NAMES- Sugar-phosphate atomsNUCLEIC_ACID_BACKBONE_ATOM_NAMES- Union of RNA+DNA backbonesMASKED- Token code for masked positionsMSAFileExtensionenum - Supported MSA file formats- Expanded
METAL_ELEMENTS- Now includes lanthanides and actinides
New Features
AtomArrayPlussupport in parser - Extended atom array with additional metadata- Spawn multiprocessing support for data loading
- zstd compression support for MSA files
- Atom37 encoding with atomization support
- JSON-level atom selection for bonds argument
Fixed
- Residue starts bug with dependent functions
- SASA calculation for empty amino acid arrays
- Null handling in A3M files
- Design tasks with zero frequency now handled gracefully instead of erroring
- Non-uniform shard sizes handling
- Pickling during data loading with spawn multiprocessing
Changed
- Loaders module restructured from
loaders.pytoloaders/subpackage (imports still work via__init__.py) - Parser cache structure now uses 2-level sharding (old caches automatically regenerated)
Deprecated
atomworks.ml.datasets.parsersmodule - Use loaders insteadStructuralDatasetWrapper- Use loader parameter on datasets directly
See CHANGELOG.md for full history.
v1.1.0
1.1.0 (2025-11-29)
Bug Fixes
- a couple of Condition updates (#59) (a0f173b)
- add a flag to optionally tolerate the situation of missing or multiple representative/center atoms per token (#67) (896ca38)
- add errors for cases where parsing an AtomArrayPlus is problematic (80d00e3)
- add padding for short residue names in sharding (94abca1)
- add raise_if_not_set to get_msa_dirs_from_env (0f50eb2)
- add raise_if_not_set to get_msa_dirs_from_env (c7bfee6)
- add within poly res idx on-the-fly option (c80104b)
- address code review issues in performance PR (9cda94a)
- allow for deleting 2d annotations from AtomArrayPlus (84e0084)
- allow numpy masks in addition to query syntax in SampleSeed (3468645)
- allow override in add global token id transform (381a743)
- apptainer (37fe7a2)
- apptainer (e3bd135)
- apptainer (0647b20)
- apptainer for CI (670a82a)
- bcif tests (378d03a)
- Be more robust to nulls in a3m files. (a8552a4)
- better messages and assertions for removing design tasks with 0 frequency (9b6391d)
- broken tests (1fcc397)
- bug in default seq cond mask (00484d4)
- change default sequence condition behavior (981d924)
- ci for internal (5096b6c)
- ci workers (f7da3cd)
- circular import (7eafda9)
- claude code review (885eb53)
- condition set mask and terminus conditions changes (56f661c)
- correct cache dir structure and add padding for short IDs (b8645fc)
- correct sharding path construction for cached residue data (79d388b)
- databases: correcting uniontype call bug (8a3e59e)
- databases: correcting uniontype call bug (ebc26db)
- datasets documentation, DSSP path (1565a94)
- docstring formatting (ff296d6)
- documentation (1e5b0d4)
- documentation, formatting (dcdde14)
- downgrade biotite (cab5bcf)
- enable deletion of 2d annotations (1f88391)
- enable spawn multiprocessing (36ac421)
- ensure that parse preserves AtomArrayPlus status, and add a test for this (681fdeb)
- ensure that the Index condition's default annotation respects its mask (#50) (e57be2a)
- Formatting (f6fe986)
- general masks in SampleSeed (53df9a6)
- give more informative error messages for ConditionalRoute or RandomRoute failures (3b72b18)
- Handle non-uniform shard sizes in AseDBDataset (e34eb51)
- infer array type of TokenEncoding where possible (#68) (a6a8fb1)
- informative Route error messages (87b1fbc)
- minor fixes (f45fafb)
- minor fixes for encodings (e043104)
- more informative error messages (7861023)
- parse preserves atom array plus (d1eef92)
- parser defaults (570f3ce)
- reduce logging level in
load_atom_array_with_conditions_from_cif(#48) (52f316d) - remove _spoof (995a260)
- remove ambiguous Greek characters and improve test assertions (bad6dff)
- remove ASE import so we dont introduce a dependency (7e12a8a)
- remove design tasks with zero frequency during sampling instead of erroring (4586fa2)
- remove hardcoded environment-specific default path (24bf03f)
- remove lineprofiler stuff (83fb3c5)
- remove print statements (ecd9e5b)
- residue starts bug with dependent functions. (04da354)
- residue starts bug with dependent functions. (fc252a8)
- rf3: json-level atom selection for bonds argument (a569a7c)
- ruff formatting after merge with dev (71ddb86)
- sasa for empty aa (c2b9302)
- shard cache on structure ID (PDB ID) instead of args hash (e5c29fd)
- Support AtomArrayPlus and AtomArrayPlusStack in parse_atom_array, with some restrictions (#46) ([c1e3b00](c1e3b0096d4d64c8073798...
v1.0.2
v1.0.1
Includes:
- Bug fixes to support RF3 inference
- More complex
AtomSelectionsyntax - Refactoring of datasets
Migration notes:
- We have moved some of the
ml.commonandio.commoninto a singlecommonfile StructuralDatasetWrappernow requires anameattribute; however, we are deprecatingStructuralDatasetWrapperin favor of a simplePandasDatasetor equivalent and will remove the class entirely in a later release
v1.0.0
1.0.0 (2025-08-18)
Bug Fixes
- 3to1 (ab6b4b2)
- adapt naming of regression tests to match new names (c44b387)
- add 'overwrite' option to view_pymol to avoid updating existing structures (#64) (ac0f12d)
- add
maketo apptainer (7cba23e) - add back readme (#1) (831bc23)
- add back stacking msas by recycle (#2) (fbe0c32)
- add conda init (2e0a0c2)
- add current data to fail log for ease of analysis (da4bdf7)
- add links to the ccd & pdb mirrors (430ae71)
- add missing default (4f020cf)
- add missing test files for local test (68f2e0a)
- add missing transforms in AF3 pipeline (39a465d)
- add new logo and changes of urls to public url (b753e57)
- add test (80b6113)
- add test cases (11dbb61)
- add test coverage bit (57166f5)
- add testpypi setup: (7ded1bf)
- add tests for
fix_formal_charge, ruff (0a4072d) - adding badges (4588955)
- address minor pipeline issues in af3 (3943cd7)
- adjust error type on transform history tracking (a468233)
- af3 parsing (#130) (37c6791)
- allow
remove_unsupported_chain_typesto work without specified query_pn_unit_iids. Implement functional API while we're at it. (126b846) - allow AddRFTemplates to proceed when no
pdb_idgiven (c63f10e) - Allow compatibility with newer rdkit version. (#122) (e6ecbac)
- allow more general covalent bonds (1ef9858)
- allow parsing entries with multiple methods (e.g.
5e5j) (28ad455) - allow passing on boolean annotations, allowing distogram bins to be a list (9253102)
- allow processing to continue in the case of covalent bonds between... (88036e4)
- allow saving of failed examples to error, default to a user-based failures path on scratch (c3160de)
- allow unknown users for CI (fa14dda)
- apptainer creation to expose /net (24b8be4)
- apptainer spec (a0c3294)
- arg_fixing: swap coordinates of nh1/nh2 instead of renaming when resolving ARG naming ambiguity, since otherwise charges & bond order are inconsistent (NH2 carries positive charge & double bond by convention) (#41) (8d4b0a6)
- argument error (9c5daba)
- atom level embeddings (#159) (ebaaf51)
- automorphisms (#36) (7cd6ad2)
- avoid building covalent bonds with water or crystallization aids (951a12c)
- bad ligands, new test dataset (0234fab)
- bonds (#125) (2b1a714)
- bug fixes for inference (#46) (e5254d9)
- bug in initializing chain info (7c89186)
- bugfix when using get_residue_starts and general annot_start_stop_idxs, which incorrectly used len() instead of .array_length() to determine the size of an AtomArrayStack (#65) (9b2cc83)
- Bugfixes in get_within_group_res_idx and get_within_poly_res_idx (#121) (4955d19)
- bugs in tests (6b72a3f)
- bugs in using MSAs for inference, supporting MSAs with # headers (f7c2c44)
- build apptainer (ce3c4d6)
- build assembly arguments (905e6b9)
- by default cast aromatic bonds to same order when comparing atom arrays for graph hashes (4587d10)
- cached conformers with chirals (#149) (cec9f83)
- calculate rf2aa chirals off af3 centers (so they are correct) (#114) (64bfca9)
- categories: keep residues not in the CCD instead of converting to UNL (#47) (6a9b0a1)
- chain type miss (0099133)
- chain_id to _iid in Frank's hotfix (9fe6186)
- chains with all resolved tokens (886ffc3)
- changing chain_iid to pn_unit_iid in AF3 features (181467e)
- changing inference ligand residue names to use non-conflicting characters (641f1e6)
- charges (d730b8a)
- chirals ([#105](https://github.com/R...