-
Notifications
You must be signed in to change notification settings - Fork 53
Description
4f799d6c-5eee-43ed-90e4-54a958c69647.zip
Label: OpenFold Consortium Member
Describe the bug
Template are processed but template features are populated for only some chains. My leading suspicion is that gaps in a template alignment may be causing that template to be skipped entirely (see below).
To Reproduce
If you are able to re-run the OF3 inference query, you will can inspect the _batch.pt saved batch and focus on the template feature with key template_backbone_frame_mask. Inspecting this variable will show the that the first two chains (A,B in this case) are not read, e.g.
batch = torch.load(
f"{your_path_to_file}/4f799d6c-5eee-43ed-90e4-54a958c69647/immrep00000/seed_2746317213/immrep00000_seed_2746317213_batch.pt"
) # change file path
batch["template_backbone_frame_mask"][0][0][:231] # this will be all zeros
I've attached query json used to generate the above saved batch, please look in query_inputs/tcrpmhc_query.json and also the alignment files in alignments which should provide you with the input files needed to run OF3 and re-make the template features.
I've ran it with this yaml here:
experiment_settings:
mode: predict
seed: 42
pytorch_ckpt_path: /mnt/inputs/of3_params/of3_ft3_v1.pt
query_json: /mnt/inputs/dataset/queries/tcrpmhc_query.json
# output_dir gets set by update_yaml(); keep it null here
output_dir: null
# use_templates is now always set by the workflow command line inputs; omit here
pl_trainer_args:
devices: 4
num_nodes: 1
precision: bf16-mixed
kubeflow: true
mpi_plugin: false
model_update:
presets: [predict] # pae and classifier are enabled in the base config
output_writer_settings:
structure_format: pdb
write_features: true # true for debugging, generally set to false
write_latent_outputs: true # true for debugging, generally set to false
with these arguments:
"--query_json",
qjson,
"--inference_ckpt_path",
ckpt, # see check point above
"--use_msa_server",
"False",
"--use_templates",
"True",
Expected behavior
I expect template features to loaded in for all chains.
Stack trace
In case it is helpful, I've looked deeper into the codebase here and found candidates to where I think the source of the bug is coming from. Namely, I think templates are skipped with alignments in which the query does not map to any template residue (indicated by a -1 in template_cache_entry.idx_map in map_token_pos_to_template_residues).
This triggers has_multioccupancy_residue here to be set to True
# Skip template if query and template are still misaligned, this can happen due to
# unhandled multi-occupancy residues or author annotation errors
# TODO: add fixes and logging for these cases
has_multioccupancy_residue = (
struc.get_residue_starts(atom_array_cropped_template).shape != repeats.shape
)
and then triggers a return of an empty template here
if has_multioccupancy_residue:
template_slice = TemplateSlice(
atom_array=AtomArray(0),
query_token_positions=np.array([]),
template_residue_repeats=np.array([]),
)
In my example, a slice of idx_map looks like this:
[ 89 89]
[ 90 90]
[ 91 91]
[ 92 -1]
[ 93 -1]
[ 94 -1]
[ 95 92]
[ 96 93]
which leads me to believe the -1 may be causing the early return of an empty TemplateSlice
Configuration (please complete the following information):
- GPU A100 node (96 cpus)
- Installation from repo
Additional context
This may be related to another issue posted by my colleague here with properly formatting .sto files: #42
I used .sto files generated by the OpenFold2 pipeline. Please let me know if that is not valid.