Skip to content

Technical issues in Tox21MolNet + corresponding Tests #53

@aditya0by0

Description

@aditya0by0

Technical issues in Tox21MolNet:

Issue 1 : Missing group Key

I've encountered an issue with the setup_processed method when working with the Tox21MolNet and its data (tox21.csv file). It appears that the file does not include a header or key named "group", which is causing a KeyError in the line:

groups = np.array([d["group"] for d in data])

Additionally, the _load_data_from_file method does not seem to utilize the any Reader to create or handle a "group" key in the data. As a result, the group key does not exist in the dictionaries produced by _load_data_from_file, leading to the observed error.
The _load_data_from_file method only yields three keys: features, labels, and ident:

yield dict(features=smiles, labels=labels, ident=row["mol_id"])

Issue 2: Generator Issue with train_test_split

Another issue arises from the use of a generator in the _load_data_from_file method. The generator object cannot be directly passed to train_test_split, as it expects a collection (e.g., a list or array). This causes the following error:

TypeError: Singleton array array(<generator object Tox21MolNet._load_data_from_file at 0x000001FD068AB1B0>,
      dtype=object) cannot be considered a valid collection.

Solution: To fix this, the generator output should be converted to a list before using it for splitting:

data = list(self._load_data_from_file(os.path.join(self.raw_dir, f"tox21.csv")))

Tests

  • Tox21MolNet:
    • Write unit tests for setup_processed() with mock data.
      • Check if output format is correct (the collator) expects a dict with features, labels, ident keys, features have to be>> able to be converted to a tensor
    • Write unit tests for _load_data_from_file() using mock file operations.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions