Skip to content

Preprocessing unit tests #45

@sfluegel05

Description

@sfluegel05

We already have some tests for data preprocessing. However, those are more integration tests that capture the behaviour of the tool as a whole than unit tests for specific functions.
In order to efficiently test the different preprocessing functionalities, we need to add some smaller-scale unit tests. Those should not include real data, but sample input values that can be generated from scratch.

Here are the classes / functions that should be covered (from the implementation in the protein_prediction branch
reader.py:

  • DataReader: to_data()
  • ChemDataReader: _read_data()
  • DeepChemDataReader: _read_data()
  • SelfiesReader: _read_data()
  • ProteinDataReader: _read_data()
    collate.py:
  • DefaultCollator: __call__()
  • RaggedCollator: __call__(), process_label_rows()
    datasets/base.py
  • XYBaseDataModule: _filter_labels()
  • DynamicDataset: get_test_split(), get_train_val_splits_given_test()
    datasets/chebi.py
  • _ChEBIDataExtractor: _extract_class_hierarchy(), _graph_to_raw_dataset(), _load_dict(), _setup_pruned_test_set()
  • ChEBIOverX: select_classes()
  • ChEBIOverXPartial: extract_class_hierarchy()
  • term_callback()
    datasets/go_uniprot.py:
  • _GOUniprotDataExtractor: _extract_class_hierarchy(), term_callback(), _graph_to_raw_dataset(), _get_swiss_to_go_mapping(), _load_dict()
  • _GoUniProtOverX: select_classes()
    datasets/tox21.py:
  • Tox21MolNet: setup_processed(), _load_data_from_file()
  • Tox21Challenge: setup_processed(), _load_data_from_file(), _load_dict()

For some functions, it is necessary to read from / write to files. Instead of real files, I would suggest to use mock objects (see e.g. this comment)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions