Skip to content

Bug: Adding feature_transforms with the 'transform' value non None causes ASCII errors during training #653

@Norgent

Description

@Norgent

I've had this error consistently with any and all feature types in the modules Contact, Components, SurfaceArea, IRC. No combination of standardization along with non-None transform fixes this. I went back and looked at al my data to check for NaN or non-numeric values and there are none.

No matter what, If I add any transforms to any of my features, even something simple as `lambda t: t.astype(np.float32), I receive an ASCII error with the error trace below. The position and exact byte changes. The error typically occurs a couple of epochs into a training loop.

  self.df.to_hdf(
Traceback (most recent call last):
  File "/home/bizon/deepranking/deepranker_script.py", line 60, in <module>
    func(**args)
  File "/home/bizon/deepranking/training_wrap.py", line 200, in train_gnn_classifier
    model.train(
  File "/home/bizon/deepranking/deeprank2/deeprank2/trainer.py", line 629, in train
    checkpoint_model = self._save_model()
  File "/home/bizon/deepranking/deeprank2/deeprank2/trainer.py", line 921, in _save_model
    deserialized_func = dill.loads(serialized_func)  # noqa: S301
  File "/home/bizon/anaconda3/envs/dr2/lib/python3.10/site-packages/dill/_dill.py", line 311, in loads
    return load(file, ignore, **kwds)
  File "/home/bizon/anaconda3/envs/dr2/lib/python3.10/site-packages/dill/_dill.py", line 297, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/bizon/anaconda3/envs/dr2/lib/python3.10/site-packages/dill/_dill.py", line 452, in load
    obj = StockUnpickler.load(self)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 16: ordinal not in range(128)

Additionally, I consistently get this warning, whether I get the prior error or not, which I'm unsure is related

PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->Index(['phase', 'entry', 'output'], dtype='object')]

I'd like some help either circumventing or fixing whatever is going on as this is preventing me from tuning my training.

OS is ubuntu 22.04 x86, DeepRank2 v3.1.0 with Pyg 2.4.0 and torch 2.1.1 on python 3.10.0.
Installation is for GPU running on an A100 80gb, CUDA 12.1, not containerized

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions