-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Technical issues in Tox21MolNet
:
Issue 1 : Missing group
Key
I've encountered an issue with the setup_processed
method when working with the Tox21MolNet
and its data (tox21.csv
file). It appears that the file does not include a header or key named "group"
, which is causing a KeyError
in the line:
groups = np.array([d["group"] for d in data])
Additionally, the _load_data_from_file
method does not seem to utilize the any Reader
to create or handle a "group"
key in the data. As a result, the group
key does not exist in the dictionaries produced by _load_data_from_file
, leading to the observed error.
The _load_data_from_file
method only yields three keys: features
, labels
, and ident
:
yield dict(features=smiles, labels=labels, ident=row["mol_id"])
Issue 2: Generator Issue with train_test_split
Another issue arises from the use of a generator in the _load_data_from_file
method. The generator object cannot be directly passed to train_test_split
, as it expects a collection (e.g., a list or array). This causes the following error:
TypeError: Singleton array array(<generator object Tox21MolNet._load_data_from_file at 0x000001FD068AB1B0>,
dtype=object) cannot be considered a valid collection.
Solution: To fix this, the generator output should be converted to a list before using it for splitting:
data = list(self._load_data_from_file(os.path.join(self.raw_dir, f"tox21.csv")))
Tests
- Tox21MolNet:
- Write unit tests for
setup_processed()
with mock data.- Check if output format is correct (the collator) expects a dict with
features
,labels
,ident
keys, features have to be>> able to be converted to a tensor
- Check if output format is correct (the collator) expects a dict with
- Write unit tests for
_load_data_from_file()
using mock file operations.
- Write unit tests for