Skip to content

Releases: RosettaCommons/atomworks

v2.2.0

19 Dec 23:54

Choose a tag to compare

2.2.0 (2025-12-19)

Features

  • add automated mmJSON parsing test and refactor buffer file type inference in io_utils.py. (5db3631)
  • Add example CIF/JSON data, update dependencies, and modify I/O parsing utilities. (ba00b49)
  • Add mmjson file type support, update file type inference, and introduce a parsing verification script with example data. (0a995dc)

v2.1.2

09 Dec 19:52

Choose a tag to compare

Trigger Zenodo workflow

v2.1.1

01 Dec 23:12

Choose a tag to compare

Release v2.1.1

Minor fixes to build for PyPi - please see v2.0.0 release notes for more details on changes since v1.0

v2.1.0

01 Dec 22:01

Choose a tag to compare

2.1.0 (2025-12-01)

Features

v2.0.0

29 Nov 19:42

Choose a tag to compare

Performance

  • Parser 2-3x faster: Significant optimizations to structure parsing, especially for symmetric assemblies
  • Cache loading 3-5x faster: Improved pickle/gzip cache handling with 2-level directory sharding for better filesystem performance
  • Vectorized annotations: add_pn_unit_iid_annotation() now uses boolean masks instead of expensive subarray operations (10-100x speedup on symmetric assemblies)

Breaking Changes

Dataset Module Restructuring

The dataset module has been restructured to align with TorchVision/TorchAudio and HuggingFace conventions, using a dataset/loader pattern:

  • Removed dataset.dataset nesting: Datasets are now flat; access data directly from the dataset object
  • MetadataRowParser deprecated: The StructuralDatasetWrapper + dataset_parser pattern is replaced with a loader parameter directly on datasets (backwards-compatible but deprecated)

Migration example:

# Old (deprecated)
from atomworks.ml.datasets import StructuralDatasetWrapper, PandasDataset
from atomworks.ml.datasets.parsers import PNUnitsDFParser

dataset = StructuralDatasetWrapper(
    dataset=PandasDataset(data="df.parquet"),
    dataset_parser=PNUnitsDFParser(...)
)

# New
from atomworks.ml.datasets import PandasDataset
from atomworks.ml.datasets.loaders import create_base_loader

dataset = PandasDataset(
    data="df.parquet",
    loader=create_base_loader(
        example_id_colname="example_id",
        path_colname="path",
    )
)

Parser Changes

  • CCD mirror path validation: ccd_mirror_path now raises FileNotFoundError if the path doesn't exist. Pass None explicitly to use Biotite's bundled CCD
  • build_assembly="_spoof" removed: Use "all" instead (raises deprecation warning)
  • convert_mse_to_met default changed: Now True by default (was False)
  • STANDARD_PARSER_ARGS renamed: Was DEFAULT_PARSE_KWARGS; now uses tuples instead of lists for hashability

Environment Changes

  • Removed automatic .env loading: dotenv is no longer auto-loaded on import. Call load_dotenv() explicitly if needed:
    from dotenv import load_dotenv
    load_dotenv()

Removed Exports

  • monkey_patch_atomarray removed from top-level exports. Use from atomworks.biotite_patch import monkey_patch_biotite instead

Added

New Modules

  • atomworks.ml.conditions - Unified conditioning management for model training
  • atomworks.ml.preprocessing.msa - MSA preprocessing (organize, filter, generate)
  • atomworks.ml.executables - External executable management (hbplus, hhfilter, mmseqs2, x3dna)
  • atomworks.ml.transforms.design_task - Design task transforms
  • atomworks.ml.transforms.mask_generator - Mask generation for training
  • atomworks.ml.utils.condition - Condition utilities
  • atomworks.io.utils.compression - Compression utilities (zstd support)

New Dataset Classes

  • FileDataset - Each file is one example (extracted from old monolithic datasets.py)
  • PandasDataset - DataFrame-backed dataset with loader support

New Loader Functions

  • create_base_loader() - Standard CIF loading
  • create_loader_with_query_pn_units() - Loading with PN unit queries
  • create_loader_with_interfaces_and_pn_units_to_score() - Interface scoring loader

New Constants

  • PROTEIN_BACKBONE_ATOM_NAMES - Backbone atoms including OXT
  • RNA_BACKBONE_ATOM_NAMES - Sugar-phosphate + 2' hydroxyl atoms
  • DNA_BACKBONE_ATOM_NAMES - Sugar-phosphate atoms
  • NUCLEIC_ACID_BACKBONE_ATOM_NAMES - Union of RNA+DNA backbones
  • MASKED - Token code for masked positions
  • MSAFileExtension enum - Supported MSA file formats
  • Expanded METAL_ELEMENTS - Now includes lanthanides and actinides

New Features

  • AtomArrayPlus support in parser - Extended atom array with additional metadata
  • Spawn multiprocessing support for data loading
  • zstd compression support for MSA files
  • Atom37 encoding with atomization support
  • JSON-level atom selection for bonds argument

Fixed

  • Residue starts bug with dependent functions
  • SASA calculation for empty amino acid arrays
  • Null handling in A3M files
  • Design tasks with zero frequency now handled gracefully instead of erroring
  • Non-uniform shard sizes handling
  • Pickling during data loading with spawn multiprocessing

Changed

  • Loaders module restructured from loaders.py to loaders/ subpackage (imports still work via __init__.py)
  • Parser cache structure now uses 2-level sharding (old caches automatically regenerated)

Deprecated

  • atomworks.ml.datasets.parsers module - Use loaders instead
  • StructuralDatasetWrapper - Use loader parameter on datasets directly

See CHANGELOG.md for full history.

v1.1.0

29 Nov 19:29

Choose a tag to compare

1.1.0 (2025-11-29)

Bug Fixes

  • a couple of Condition updates (#59) (a0f173b)
  • add a flag to optionally tolerate the situation of missing or multiple representative/center atoms per token (#67) (896ca38)
  • add errors for cases where parsing an AtomArrayPlus is problematic (80d00e3)
  • add padding for short residue names in sharding (94abca1)
  • add raise_if_not_set to get_msa_dirs_from_env (0f50eb2)
  • add raise_if_not_set to get_msa_dirs_from_env (c7bfee6)
  • add within poly res idx on-the-fly option (c80104b)
  • address code review issues in performance PR (9cda94a)
  • allow for deleting 2d annotations from AtomArrayPlus (84e0084)
  • allow numpy masks in addition to query syntax in SampleSeed (3468645)
  • allow override in add global token id transform (381a743)
  • apptainer (37fe7a2)
  • apptainer (e3bd135)
  • apptainer (0647b20)
  • apptainer for CI (670a82a)
  • bcif tests (378d03a)
  • Be more robust to nulls in a3m files. (a8552a4)
  • better messages and assertions for removing design tasks with 0 frequency (9b6391d)
  • broken tests (1fcc397)
  • bug in default seq cond mask (00484d4)
  • change default sequence condition behavior (981d924)
  • ci for internal (5096b6c)
  • ci workers (f7da3cd)
  • circular import (7eafda9)
  • claude code review (885eb53)
  • condition set mask and terminus conditions changes (56f661c)
  • correct cache dir structure and add padding for short IDs (b8645fc)
  • correct sharding path construction for cached residue data (79d388b)
  • databases: correcting uniontype call bug (8a3e59e)
  • databases: correcting uniontype call bug (ebc26db)
  • datasets documentation, DSSP path (1565a94)
  • docstring formatting (ff296d6)
  • documentation (1e5b0d4)
  • documentation, formatting (dcdde14)
  • downgrade biotite (cab5bcf)
  • enable deletion of 2d annotations (1f88391)
  • enable spawn multiprocessing (36ac421)
  • ensure that parse preserves AtomArrayPlus status, and add a test for this (681fdeb)
  • ensure that the Index condition's default annotation respects its mask (#50) (e57be2a)
  • Formatting (f6fe986)
  • general masks in SampleSeed (53df9a6)
  • give more informative error messages for ConditionalRoute or RandomRoute failures (3b72b18)
  • Handle non-uniform shard sizes in AseDBDataset (e34eb51)
  • infer array type of TokenEncoding where possible (#68) (a6a8fb1)
  • informative Route error messages (87b1fbc)
  • minor fixes (f45fafb)
  • minor fixes for encodings (e043104)
  • more informative error messages (7861023)
  • parse preserves atom array plus (d1eef92)
  • parser defaults (570f3ce)
  • reduce logging level in load_atom_array_with_conditions_from_cif (#48) (52f316d)
  • remove _spoof (995a260)
  • remove ambiguous Greek characters and improve test assertions (bad6dff)
  • remove ASE import so we dont introduce a dependency (7e12a8a)
  • remove design tasks with zero frequency during sampling instead of erroring (4586fa2)
  • remove hardcoded environment-specific default path (24bf03f)
  • remove lineprofiler stuff (83fb3c5)
  • remove print statements (ecd9e5b)
  • residue starts bug with dependent functions. (04da354)
  • residue starts bug with dependent functions. (fc252a8)
  • rf3: json-level atom selection for bonds argument (a569a7c)
  • ruff formatting after merge with dev (71ddb86)
  • sasa for empty aa (c2b9302)
  • shard cache on structure ID (PDB ID) instead of args hash (e5c29fd)
  • Support AtomArrayPlus and AtomArrayPlusStack in parse_atom_array, with some restrictions (#46) ([c1e3b00](c1e3b0096d4d64c8073798...
Read more

v1.0.2

18 Sep 02:51

Choose a tag to compare

1.0.2 (2025-09-18)

Bug Fixes

  • update homepage and documentation URLs in pyproject.toml (4591c98)

v1.0.1

18 Sep 02:03

Choose a tag to compare

Includes:

  • Bug fixes to support RF3 inference
  • More complex AtomSelection syntax
  • Refactoring of datasets

Migration notes:

  • We have moved some of the ml.common and io.common into a single common file
  • StructuralDatasetWrapper now requires a name attribute; however, we are deprecating StructuralDatasetWrapper in favor of a simple PandasDataset or equivalent and will remove the class entirely in a later release

v1.0.0

18 Aug 18:08
3c2fd43

Choose a tag to compare

1.0.0 (2025-08-18)

  • BREAKING CHANGE: update to cifutils 2.0 (#50) (77dd6fd)

Bug Fixes

  • 3to1 (ab6b4b2)
  • adapt naming of regression tests to match new names (c44b387)
  • add 'overwrite' option to view_pymol to avoid updating existing structures (#64) (ac0f12d)
  • add make to apptainer (7cba23e)
  • add back readme (#1) (831bc23)
  • add back stacking msas by recycle (#2) (fbe0c32)
  • add conda init (2e0a0c2)
  • add current data to fail log for ease of analysis (da4bdf7)
  • add links to the ccd & pdb mirrors (430ae71)
  • add missing default (4f020cf)
  • add missing test files for local test (68f2e0a)
  • add missing transforms in AF3 pipeline (39a465d)
  • add new logo and changes of urls to public url (b753e57)
  • add test (80b6113)
  • add test cases (11dbb61)
  • add test coverage bit (57166f5)
  • add testpypi setup: (7ded1bf)
  • add tests for fix_formal_charge, ruff (0a4072d)
  • adding badges (4588955)
  • address minor pipeline issues in af3 (3943cd7)
  • adjust error type on transform history tracking (a468233)
  • af3 parsing (#130) (37c6791)
  • allow remove_unsupported_chain_types to work without specified query_pn_unit_iids. Implement functional API while we're at it. (126b846)
  • allow AddRFTemplates to proceed when no pdb_id given (c63f10e)
  • Allow compatibility with newer rdkit version. (#122) (e6ecbac)
  • allow more general covalent bonds (1ef9858)
  • allow parsing entries with multiple methods (e.g. 5e5j) (28ad455)
  • allow passing on boolean annotations, allowing distogram bins to be a list (9253102)
  • allow processing to continue in the case of covalent bonds between... (88036e4)
  • allow saving of failed examples to error, default to a user-based failures path on scratch (c3160de)
  • allow unknown users for CI (fa14dda)
  • apptainer creation to expose /net (24b8be4)
  • apptainer spec (a0c3294)
  • arg_fixing: swap coordinates of nh1/nh2 instead of renaming when resolving ARG naming ambiguity, since otherwise charges & bond order are inconsistent (NH2 carries positive charge & double bond by convention) (#41) (8d4b0a6)
  • argument error (9c5daba)
  • atom level embeddings (#159) (ebaaf51)
  • automorphisms (#36) (7cd6ad2)
  • avoid building covalent bonds with water or crystallization aids (951a12c)
  • bad ligands, new test dataset (0234fab)
  • bonds (#125) (2b1a714)
  • bug fixes for inference (#46) (e5254d9)
  • bug in initializing chain info (7c89186)
  • bugfix when using get_residue_starts and general annot_start_stop_idxs, which incorrectly used len() instead of .array_length() to determine the size of an AtomArrayStack (#65) (9b2cc83)
  • Bugfixes in get_within_group_res_idx and get_within_poly_res_idx (#121) (4955d19)
  • bugs in tests (6b72a3f)
  • bugs in using MSAs for inference, supporting MSAs with # headers (f7c2c44)
  • build apptainer (ce3c4d6)
  • build assembly arguments (905e6b9)
  • by default cast aromatic bonds to same order when comparing atom arrays for graph hashes (4587d10)
  • cached conformers with chirals (#149) (cec9f83)
  • calculate rf2aa chirals off af3 centers (so they are correct) (#114) (64bfca9)
  • categories: keep residues not in the CCD instead of converting to UNL (#47) (6a9b0a1)
  • chain type miss (0099133)
  • chain_id to _iid in Frank's hotfix (9fe6186)
  • chains with all resolved tokens (886ffc3)
  • changing chain_iid to pn_unit_iid in AF3 features (181467e)
  • changing inference ligand residue names to use non-conflicting characters (641f1e6)
  • charges (d730b8a)
  • chirals ([#105](https://github.com/R...
Read more