This diffusion model network predicts the structure of peptides (p) within the pocket of a major histocompatibility complex (MHC).
- It predicts C-alpha positions from gaussian noise.
- It predicts the orientation of the N, C, C-beta atoms around C-alpha from any random rotation quaternion.
- It predicts the torsion angles of the side chain atoms from any random angle.
- Pytorch 2.0.0
- h5py 3.11.0
- Numpy 2.0.0
- BioPython 1.84
- OpenFold 0.0.1
The input format is HDF5 like in SwiftMHC. The output of SwiftMHC preprocessing can be used as input data. (https://github.com/x-lab-3d/swiftmhc)
Format:
HDF5 file:
|
+ -- complex 1:
| + -- name (str)
| |
| + -- peptide
| | |
| | + -- backbone_rigid_tensor (P x 4 x 4)
| | + -- aatype (P)
| | + -- sequence_onehot (P x 22)
| | + -- torsion_angles_sin_cos (P x 7 x 2)
| | + -- torsion_angles_mask (P x 7)
| |
| + -- protein
| |
| + -- backbone_rigid_tensor (M x 4 x 4)
| + -- aatype (M)
| + -- sequence_onehot (M x 22)
| + -- atom14_gt_positions (M x 14 x 3)
| + -- atom14_gt_exists (M x 14)
| + -- cross_residues_mask (M)
+ -- complex 2:
+ ....
Where M is the number of amino acids in the MHC and P is the number of amino acids in the peptide.
To train for a 100 epochs with 1000 noise steps (default):
$ python optimize.py train_set.hdf5 100 model.pth
A pretrained model is already included with this repository. It's named model.pth.
To make a pretrained model sample structures with 1000 noise steps (default) for an unseen test set:
$ python test.py model.pth test_set.hdf5
This will automatically create a directory named test_set-sampled.
The structures will be stored in this directory.