Skip to content

[BUG] Template alignments skipped after processing. #43

@halasadi

Description

@halasadi

4f799d6c-5eee-43ed-90e4-54a958c69647.zip

Label: OpenFold Consortium Member

Describe the bug

Template are processed but template features are populated for only some chains. My leading suspicion is that gaps in a template alignment may be causing that template to be skipped entirely (see below).

To Reproduce
If you are able to re-run the OF3 inference query, you will can inspect the _batch.pt saved batch and focus on the template feature with key template_backbone_frame_mask. Inspecting this variable will show the that the first two chains (A,B in this case) are not read, e.g.

batch = torch.load(
    f"{your_path_to_file}/4f799d6c-5eee-43ed-90e4-54a958c69647/immrep00000/seed_2746317213/immrep00000_seed_2746317213_batch.pt"
) # change file path
batch["template_backbone_frame_mask"][0][0][:231] # this will be all zeros 

I've attached query json used to generate the above saved batch, please look in query_inputs/tcrpmhc_query.json and also the alignment files in alignments which should provide you with the input files needed to run OF3 and re-make the template features.

I've ran it with this yaml here:

experiment_settings:
  mode: predict
  seed: 42
  pytorch_ckpt_path: /mnt/inputs/of3_params/of3_ft3_v1.pt
  query_json: /mnt/inputs/dataset/queries/tcrpmhc_query.json
  # output_dir gets set by update_yaml(); keep it null here
  output_dir: null
  # use_templates is now always set by the workflow command line inputs; omit here

pl_trainer_args:
  devices: 4
  num_nodes: 1
  precision: bf16-mixed
  kubeflow: true
  mpi_plugin: false

model_update:
  presets: [predict] # pae and classifier are enabled in the base config

output_writer_settings:
  structure_format: pdb
  write_features: true # true for debugging, generally set to false
  write_latent_outputs: true # true for debugging, generally set to false

with these arguments:

        "--query_json",
        qjson,
        "--inference_ckpt_path",
        ckpt, # see check point above
        "--use_msa_server",
        "False",
        "--use_templates",
        "True",

Expected behavior
I expect template features to loaded in for all chains.

Stack trace
In case it is helpful, I've looked deeper into the codebase here and found candidates to where I think the source of the bug is coming from. Namely, I think templates are skipped with alignments in which the query does not map to any template residue (indicated by a -1 in template_cache_entry.idx_map in map_token_pos_to_template_residues).

This triggers has_multioccupancy_residue here to be set to True

    # Skip template if query and template are still misaligned, this can happen due to
    # unhandled multi-occupancy residues or author annotation errors
    # TODO: add fixes and logging for these cases
    has_multioccupancy_residue = (
        struc.get_residue_starts(atom_array_cropped_template).shape != repeats.shape
    )

and then triggers a return of an empty template here

   if has_multioccupancy_residue:
        template_slice = TemplateSlice(
            atom_array=AtomArray(0),
            query_token_positions=np.array([]),
            template_residue_repeats=np.array([]),
        )

In my example, a slice of idx_map looks like this:

[ 89  89]
[ 90  90]
[ 91  91]
[ 92  -1]
[ 93  -1]
[ 94  -1]
[ 95  92]
[ 96  93]

which leads me to believe the -1 may be causing the early return of an empty TemplateSlice

Configuration (please complete the following information):

  • GPU A100 node (96 cpus)
  • Installation from repo

Additional context
This may be related to another issue posted by my colleague here with properly formatting .sto files: #42
I used .sto files generated by the OpenFold2 pipeline. Please let me know if that is not valid.

Metadata

Metadata

Assignees

Labels

OpenFold Consortium MemberUse this tag if you are a member of the OpenFold Consortium to receive higher prioritybugSomething isn't workingdata preprocessingRelating to the preprocessing of queries and datasets

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions