Skip to content

Potential Processing Error in the Original QM8 Dataset on Some Tasks #41

@rbharath

Description

@rbharath

There is a potential error in the QM8 dataset from the original MoleculeNet paper caused by duplicate columns (possibly due to a pandas data processing error).

deepchem/deepchem#2747

We are still working to verify the error but in the meanwhile there is a fix PR under review that you can use:

deepchem/deepchem#2756

Assuming the error is indeed present, the benchmarking numbers for QM8 may need to be rerun. The duplicated columns are for two very similar tasks though (the two tasks are to predict DFT results on the same molecule computed with the same functional but different basis sets) so I suspect that the qualitative changes will be relatively minimal (models have in effect been double predicting one DFT run instead of two slightly different DFT runs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions