19 Dec 23:54

github-actions

9b21130

v2.2.0 Latest

Latest

2.2.0 (2025-12-19)

Features

add automated mmJSON parsing test and refactor buffer file type inference in io_utils.py. (5db3631)
Add example CIF/JSON data, update dependencies, and modify I/O parsing utilities. (ba00b49)
Add mmjson file type support, update file type inference, and introduce a parsing verification script with example data. (0a995dc)

Assets 2

09 Dec 19:52

nscorley

v2.1.2

c314ddd

v2.1.2

Trigger Zenodo workflow

Assets 2

01 Dec 23:12

nscorley

v2.1.1

33b811a

v2.1.1

Release v2.1.1

Minor fixes to build for PyPi - please see v2.0.0 release notes for more details on changes since v1.0

Assets 2

01 Dec 22:01

github-actions

v2.1.0

2488d84

v2.1.0

2.1.0 (2025-12-01)

Features

hatch build change (#61) (32f577a)

Assets 2

29 Nov 19:42

nscorley

v2.0.0

5811631

v2.0.0

Performance

Parser 2-3x faster: Significant optimizations to structure parsing, especially for symmetric assemblies
Cache loading 3-5x faster: Improved pickle/gzip cache handling with 2-level directory sharding for better filesystem performance
Vectorized annotations: add_pn_unit_iid_annotation() now uses boolean masks instead of expensive subarray operations (10-100x speedup on symmetric assemblies)

Breaking Changes

Dataset Module Restructuring

The dataset module has been restructured to align with TorchVision/TorchAudio and HuggingFace conventions, using a dataset/loader pattern:

Removed dataset.dataset nesting: Datasets are now flat; access data directly from the dataset object
MetadataRowParser deprecated: The StructuralDatasetWrapper + dataset_parser pattern is replaced with a loader parameter directly on datasets (backwards-compatible but deprecated)

Migration example:

# Old (deprecated)
from atomworks.ml.datasets import StructuralDatasetWrapper, PandasDataset
from atomworks.ml.datasets.parsers import PNUnitsDFParser

dataset = StructuralDatasetWrapper(
    dataset=PandasDataset(data="df.parquet"),
    dataset_parser=PNUnitsDFParser(...)
)

# New
from atomworks.ml.datasets import PandasDataset
from atomworks.ml.datasets.loaders import create_base_loader

dataset = PandasDataset(
    data="df.parquet",
    loader=create_base_loader(
        example_id_colname="example_id",
        path_colname="path",
    )
)

Parser Changes

CCD mirror path validation: ccd_mirror_path now raises FileNotFoundError if the path doesn't exist. Pass None explicitly to use Biotite's bundled CCD
build_assembly="_spoof" removed: Use "all" instead (raises deprecation warning)
convert_mse_to_met default changed: Now True by default (was False)
STANDARD_PARSER_ARGS renamed: Was DEFAULT_PARSE_KWARGS; now uses tuples instead of lists for hashability

Environment Changes

Removed automatic .env loading: dotenv is no longer auto-loaded on import. Call load_dotenv() explicitly if needed:
```
from dotenv import load_dotenv
load_dotenv()
```

Removed Exports

monkey_patch_atomarray removed from top-level exports. Use from atomworks.biotite_patch import monkey_patch_biotite instead

Added

New Modules

atomworks.ml.conditions - Unified conditioning management for model training
atomworks.ml.preprocessing.msa - MSA preprocessing (organize, filter, generate)
atomworks.ml.executables - External executable management (hbplus, hhfilter, mmseqs2, x3dna)
atomworks.ml.transforms.design_task - Design task transforms
atomworks.ml.transforms.mask_generator - Mask generation for training
atomworks.ml.utils.condition - Condition utilities
atomworks.io.utils.compression - Compression utilities (zstd support)

New Dataset Classes

FileDataset - Each file is one example (extracted from old monolithic datasets.py)
PandasDataset - DataFrame-backed dataset with loader support

New Loader Functions

create_base_loader() - Standard CIF loading
create_loader_with_query_pn_units() - Loading with PN unit queries
create_loader_with_interfaces_and_pn_units_to_score() - Interface scoring loader

New Constants

PROTEIN_BACKBONE_ATOM_NAMES - Backbone atoms including OXT
RNA_BACKBONE_ATOM_NAMES - Sugar-phosphate + 2' hydroxyl atoms
DNA_BACKBONE_ATOM_NAMES - Sugar-phosphate atoms
NUCLEIC_ACID_BACKBONE_ATOM_NAMES - Union of RNA+DNA backbones
MASKED - Token code for masked positions
MSAFileExtension enum - Supported MSA file formats
Expanded METAL_ELEMENTS - Now includes lanthanides and actinides

New Features

AtomArrayPlus support in parser - Extended atom array with additional metadata
Spawn multiprocessing support for data loading
zstd compression support for MSA files
Atom37 encoding with atomization support
JSON-level atom selection for bonds argument

Fixed

Residue starts bug with dependent functions
SASA calculation for empty amino acid arrays
Null handling in A3M files
Design tasks with zero frequency now handled gracefully instead of erroring
Non-uniform shard sizes handling
Pickling during data loading with spawn multiprocessing

Changed

Loaders module restructured from loaders.py to loaders/ subpackage (imports still work via __init__.py)
Parser cache structure now uses 2-level sharding (old caches automatically regenerated)

Deprecated

atomworks.ml.datasets.parsers module - Use loaders instead
StructuralDatasetWrapper - Use loader parameter on datasets directly

See CHANGELOG.md for full history.

Assets 2

29 Nov 19:29

github-actions

v1.1.0

7a56351

v1.1.0

1.1.0 (2025-11-29)

Bug Fixes

a couple of Condition updates (#59) (a0f173b)
add a flag to optionally tolerate the situation of missing or multiple representative/center atoms per token (#67) (896ca38)
add errors for cases where parsing an AtomArrayPlus is problematic (80d00e3)
add padding for short residue names in sharding (94abca1)
add raise_if_not_set to get_msa_dirs_from_env (0f50eb2)
add raise_if_not_set to get_msa_dirs_from_env (c7bfee6)
add within poly res idx on-the-fly option (c80104b)
address code review issues in performance PR (9cda94a)
allow for deleting 2d annotations from AtomArrayPlus (84e0084)
allow numpy masks in addition to query syntax in SampleSeed (3468645)
allow override in add global token id transform (381a743)
apptainer (37fe7a2)
apptainer (e3bd135)
apptainer (0647b20)
apptainer for CI (670a82a)
bcif tests (378d03a)
Be more robust to nulls in a3m files. (a8552a4)
better messages and assertions for removing design tasks with 0 frequency (9b6391d)
broken tests (1fcc397)
bug in default seq cond mask (00484d4)
change default sequence condition behavior (981d924)
ci for internal (5096b6c)
ci workers (f7da3cd)
circular import (7eafda9)
claude code review (885eb53)
condition set mask and terminus conditions changes (56f661c)
correct cache dir structure and add padding for short IDs (b8645fc)
correct sharding path construction for cached residue data (79d388b)
databases: correcting uniontype call bug (8a3e59e)
databases: correcting uniontype call bug (ebc26db)
datasets documentation, DSSP path (1565a94)
docstring formatting (ff296d6)
documentation (1e5b0d4)
documentation, formatting (dcdde14)
downgrade biotite (cab5bcf)
enable deletion of 2d annotations (1f88391)
enable spawn multiprocessing (36ac421)
ensure that parse preserves AtomArrayPlus status, and add a test for this (681fdeb)
ensure that the Index condition's default annotation respects its mask (#50) (e57be2a)
Formatting (f6fe986)
general masks in SampleSeed (53df9a6)
give more informative error messages for ConditionalRoute or RandomRoute failures (3b72b18)
Handle non-uniform shard sizes in AseDBDataset (e34eb51)
infer array type of TokenEncoding where possible (#68) (a6a8fb1)
informative Route error messages (87b1fbc)
minor fixes (f45fafb)
minor fixes for encodings (e043104)
more informative error messages (7861023)
parse preserves atom array plus (d1eef92)
parser defaults (570f3ce)
reduce logging level in load_atom_array_with_conditions_from_cif (#48) (52f316d)
remove _spoof (995a260)
remove ambiguous Greek characters and improve test assertions (bad6dff)
remove ASE import so we dont introduce a dependency (7e12a8a)
remove design tasks with zero frequency during sampling instead of erroring (4586fa2)
remove hardcoded environment-specific default path (24bf03f)
remove lineprofiler stuff (83fb3c5)
remove print statements (ecd9e5b)
residue starts bug with dependent functions. (04da354)
residue starts bug with dependent functions. (fc252a8)
rf3: json-level atom selection for bonds argument (a569a7c)
ruff formatting after merge with dev (71ddb86)
sasa for empty aa (c2b9302)
shard cache on structure ID (PDB ID) instead of args hash (e5c29fd)
Support AtomArrayPlus and AtomArrayPlusStack in parse_atom_array, with some restrictions (#46) ([c1e3b00](c1e3b0096d4d64c8073798...

Assets 2

18 Sep 02:51

github-actions

v1.0.2

add61b9

v1.0.2

1.0.2 (2025-09-18)

Bug Fixes

update homepage and documentation URLs in pyproject.toml (4591c98)

Assets 2

18 Sep 02:03

nscorley

v1.0.1

6450232

v1.0.1

Includes:

Bug fixes to support RF3 inference
More complex AtomSelection syntax
Refactoring of datasets

Migration notes:

We have moved some of the ml.common and io.common into a single common file
StructuralDatasetWrapper now requires a name attribute; however, we are deprecating StructuralDatasetWrapper in favor of a simple PandasDataset or equivalent and will remove the class entirely in a later release

Assets 2

18 Aug 18:08

github-actions

v1.0.0

3c2fd43

v1.0.0

1.0.0 (2025-08-18)

BREAKING CHANGE: update to cifutils 2.0 (#50) (77dd6fd)

Bug Fixes

3to1 (ab6b4b2)
adapt naming of regression tests to match new names (c44b387)
add 'overwrite' option to view_pymol to avoid updating existing structures (#64) (ac0f12d)
add make to apptainer (7cba23e)
add back readme (#1) (831bc23)
add back stacking msas by recycle (#2) (fbe0c32)
add conda init (2e0a0c2)
add current data to fail log for ease of analysis (da4bdf7)
add links to the ccd & pdb mirrors (430ae71)
add missing default (4f020cf)
add missing test files for local test (68f2e0a)
add missing transforms in AF3 pipeline (39a465d)
add new logo and changes of urls to public url (b753e57)
add test (80b6113)
add test cases (11dbb61)
add test coverage bit (57166f5)
add testpypi setup: (7ded1bf)
add tests for fix_formal_charge, ruff (0a4072d)
adding badges (4588955)
address minor pipeline issues in af3 (3943cd7)
adjust error type on transform history tracking (a468233)
af3 parsing (#130) (37c6791)
allow remove_unsupported_chain_types to work without specified query_pn_unit_iids. Implement functional API while we're at it. (126b846)
allow AddRFTemplates to proceed when no pdb_id given (c63f10e)
Allow compatibility with newer rdkit version. (#122) (e6ecbac)
allow more general covalent bonds (1ef9858)
allow parsing entries with multiple methods (e.g. 5e5j) (28ad455)
allow passing on boolean annotations, allowing distogram bins to be a list (9253102)
allow processing to continue in the case of covalent bonds between... (88036e4)
allow saving of failed examples to error, default to a user-based failures path on scratch (c3160de)
allow unknown users for CI (fa14dda)
apptainer creation to expose /net (24b8be4)
apptainer spec (a0c3294)
arg_fixing: swap coordinates of nh1/nh2 instead of renaming when resolving ARG naming ambiguity, since otherwise charges & bond order are inconsistent (NH2 carries positive charge & double bond by convention) (#41) (8d4b0a6)
argument error (9c5daba)
atom level embeddings (#159) (ebaaf51)
automorphisms (#36) (7cd6ad2)
avoid building covalent bonds with water or crystallization aids (951a12c)
bad ligands, new test dataset (0234fab)
bonds (#125) (2b1a714)
bug fixes for inference (#46) (e5254d9)
bug in initializing chain info (7c89186)
bugfix when using get_residue_starts and general annot_start_stop_idxs, which incorrectly used len() instead of .array_length() to determine the size of an AtomArrayStack (#65) (9b2cc83)
Bugfixes in get_within_group_res_idx and get_within_poly_res_idx (#121) (4955d19)
bugs in tests (6b72a3f)
bugs in using MSAs for inference, supporting MSAs with # headers (f7c2c44)
build apptainer (ce3c4d6)
build assembly arguments (905e6b9)
by default cast aromatic bonds to same order when comparing atom arrays for graph hashes (4587d10)
cached conformers with chirals (#149) (cec9f83)
calculate rf2aa chirals off af3 centers (so they are correct) (#114) (64bfca9)
categories: keep residues not in the CCD instead of converting to UNL (#47) (6a9b0a1)
chain type miss (0099133)
chain_id to _iid in Frank's hotfix (9fe6186)
chains with all resolved tokens (886ffc3)
changing chain_iid to pn_unit_iid in AF3 features (181467e)
changing inference ligand residue names to use non-conflicting characters (641f1e6)
charges (d730b8a)
chirals ([#105](https://github.com/R...

Assets 2

Releases: RosettaCommons/atomworks

v2.2.0

2.2.0 (2025-12-19)

Features

Uh oh!

v2.1.2

Uh oh!

v2.1.1

Uh oh!

v2.1.0

2.1.0 (2025-12-01)

Features

Uh oh!

v2.0.0

Performance

Breaking Changes

Dataset Module Restructuring

Parser Changes

Environment Changes

Removed Exports

Added

New Modules

New Dataset Classes

New Loader Functions

New Constants

New Features

Fixed

Changed

Deprecated

Uh oh!

v1.1.0

1.1.0 (2025-11-29)

Bug Fixes

Uh oh!

v1.0.2

1.0.2 (2025-09-18)

Bug Fixes

Uh oh!

v1.0.1

Uh oh!

v1.0.0

1.0.0 (2025-08-18)

Bug Fixes

Uh oh!