You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently got started using DeepRank2 to see if it can make a capable predictor on some PPI structure data that I have. I successfully installed the GPU version using Conda (and pip), and have been able to train a GINet model following the tutorial pretty closely. The code I used is below. I know the data generation was successful since I am able to visualize it using GraphDataset.hd5topandas()
Not shown: Loading the data and splitting it using train_test_split() basically the same as in the tutorial
My issue is when I try to run an inference script that I wrote to load a totally separate dataset (not derived from the training data) and run a class prediction using the pre-trained GINet. I construct the GraphDataset with only dataset_train and with the clustering_method and train_source set to match my pre-trained model, but I still get the error when running Trainer.test() 'GlobalStorage' object has no attribute 'cluster0' which I know means that there is no clustering on my dataset even though I have that option set. Code below:
queries=QueryCollection()
forpdbininput_files:
queries.add(ProteinProteinInterfaceQuery(
pdb_path=pdb,
resolution='residue',
chain_ids= ['B', 'T'],
), verbose=True)
inference_data=queries.process(
prefix=os.path.join(data_output_dir, os.path.basename(input_dir)),
log_error_traceback=True,
combine_output=True
)
input_data=GraphDataset(hdf5_path=inference_data,
clustering_method='mcl', # I thought this would address any clustering errorstrain_source="model.pth.tar",
use_tqdm=True)
model=Trainer(
GINet,
dataset_test=input_data,
pretrained_model="model.pth.tar",
output_exporters= [HDF5OutputExporter("Predictions")]
)
# No errors raised up to this pointresults=model.test() # Raises 'GlobalStorage' object has no attribute 'cluster0' error
So I'm wondering if there's an issue with the way i do the data generation for my new dataset where the clustering operation isn't happening correctly and thus isn't recognized by the model, or if there could be an issue with the data itself which I can't open with GraphDataset.hd5topandas(). Or anything else that could be causing this error.
For reference, I am running version 3.1.0 on Python 3.10.0 with torch 2.1.1 and PyG 2.4.0. OS is Ubuntu 22.04, and I edit and run the code using the interactive code cell feature on VSCode. GPU is A100.
I am only passingly familiar with Torch and PyG (mostly through using this package). Any help is appreciated. Thanks!
UPDATE:
I think I have figured out why and a way around it, but this may warrant an addition to the source code.
msg=f"Invalid node clustering method: {self.clustering_method}. Please set clustering_method to 'mcl', 'louvain' or None."
raiseValueError(msg)
else:
ifself.neuralnetisNone:
msg="No neural network class found. Please add it to complete loading the pretrained model."
raiseValueError(msg)
ifself.dataset_testisNone:
msg="No dataset_test found. Please add it to evaluate the pretrained model."
raiseValueError(msg)
ifself.dataset_trainisnotNone:
self.dataset_train=None
_log.warning("Pretrained model loaded: dataset_train will be ignored.")
ifself.dataset_valisnotNone:
self.dataset_val=None
_log.warning("Pretrained model loaded: dataset_val will be ignored.")
self._init_from_dataset(self.dataset_test)
self._load_params()
self._load_pretrained_model()
This code block only computes clusters if not loading a pre-trained model. When loading only a dataset_test with a pre-trained model, the self._precluster() method is not run even if clustering_method is set on the GraphDataset. This behaviour is incompatible with ginet.GINet since the architecture requires community pooling to function.
For now, adding this line to my code model._precluster(input_data) (reusing the definitions in my code above) forces the Trainer instance to compute clusters on my dataset. This can be added after constructing the Trainer without issue, if I'm not mistaken, since the clustering information is saved to the .hdf5 file.
I may add an issue referencing this behavior as it'd be nice to have this step added to the source code.
Still I'd like confirmation that this is a valid solution and won't introduce any silent bugs, particularly in my classification results.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I recently got started using DeepRank2 to see if it can make a capable predictor on some PPI structure data that I have. I successfully installed the GPU version using Conda (and pip), and have been able to train a GINet model following the tutorial pretty closely. The code I used is below. I know the data generation was successful since I am able to visualize it using GraphDataset.hd5topandas()
Not shown: Loading the data and splitting it using train_test_split() basically the same as in the tutorial
My issue is when I try to run an inference script that I wrote to load a totally separate dataset (not derived from the training data) and run a class prediction using the pre-trained GINet. I construct the GraphDataset with only
dataset_trainand with theclustering_methodandtrain_sourceset to match my pre-trained model, but I still get the error when running Trainer.test()'GlobalStorage' object has no attribute 'cluster0'which I know means that there is no clustering on my dataset even though I have that option set. Code below:So I'm wondering if there's an issue with the way i do the data generation for my new dataset where the clustering operation isn't happening correctly and thus isn't recognized by the model, or if there could be an issue with the data itself which I can't open with GraphDataset.hd5topandas(). Or anything else that could be causing this error.
For reference, I am running version 3.1.0 on Python 3.10.0 with torch 2.1.1 and PyG 2.4.0. OS is Ubuntu 22.04, and I edit and run the code using the interactive code cell feature on VSCode. GPU is A100.
I am only passingly familiar with Torch and PyG (mostly through using this package). Any help is appreciated. Thanks!
UPDATE:
I think I have figured out why and a way around it, but this may warrant an addition to the source code.
deeprank2/deeprank2/trainer.py
Lines 131 to 184 in 78e2773
This code block only computes clusters if not loading a pre-trained model. When loading only a
dataset_testwith a pre-trained model, the self._precluster() method is not run even ifclustering_methodis set on the GraphDataset. This behaviour is incompatible with ginet.GINet since the architecture requires community pooling to function.For now, adding this line to my code
model._precluster(input_data)(reusing the definitions in my code above) forces the Trainer instance to compute clusters on my dataset. This can be added after constructing the Trainer without issue, if I'm not mistaken, since the clustering information is saved to the .hdf5 file.I may add an issue referencing this behavior as it'd be nice to have this step added to the source code.
Still I'd like confirmation that this is a valid solution and won't introduce any silent bugs, particularly in my classification results.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions