String kernels can exploit biases in SMILES string format, skewing performance metrics

Just a heads up on this issue:
https://github.com/deepchem/moleculenet/issues/15

I propose that string-based classifiers canonicalize smiles prior to processing to prevent confounded performance, CI, etc. estimates.