feat: add a PyTorch backend #541
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've become familiar with PyTorch recently because of writing https://github.com/hsf-training/deep-learning-intro-for-hep/
I've also been looking at the Vector documentation because I think it needs an overhaul to be more physicist-friendly. Along the way, I noticed that there's no PyTorch backend yet, but it would be really useful to have one. Vector's approach to NumPy arrays is to expect them to be structured arrays, but feature vectors in an ML model are always unstructured. (Note: there's a conversion function: np.lib.recfunctions.structured_to_unstructured.)
Generally, feature vectors in an ML model will have a few indexes corresponding to vector coordinates and many others that don't. If the first 4 features are$p_T$ , $\eta$ , $\phi$ , and mass, we might want to denote that with
pt_index=0, phi_index=2, eta_index=1, mass_index=3in such a way that they can be picked out of a tensor namedfeatureslikeIt would be nice if the
featuresvector was a subclass oftorch.Tensorthat produces the above viaAnd then if someone asks for
it would compute$p_z$ using the appropriate compute function. With
torchas thelibargument of thevector._computefunctions, they would all be autodiffed and could be used in an optimization procedure with backpropagation. The library functions thatvector._computeneeds,vector/tests/test_compute_features.py
Lines 357 to 380 in 7cd311d
are all defined in the
torchmodule:so they probably don't even need a shim (which SymPy needed).
Below is the start of an implementation, using https://pytorch.org/docs/stable/notes/extending.html#extending-torch-python-api as a guide. PyTorch defines a
__torch_function__method (see this investigation), making it possible to overload without even creating real subclasses oftorch.Tensor, but I think it's a good idea to make subclasses oftorch.Tensorbecause these are mostly-normal feature vectors: they just have a few extra properties and methods.But then I got to the point where I'd have to wrap all of the functions and remembered that that's where all of the complexity is. Some functions (possibly methods or properties) take 1 input vectors and return a non-vector, others return a vector, while some other functions take 2 input vectors with both kinds of output, I don't think there are any functions that take more than 2, but there are some functions that don't do anything to the vector properties, like a PyTorch function to move data to and from the GPU or change its dtype. (Possible simplification: maybe all vector components can be forced to be float32?)
Some of the functions will have to shuffle the indexes to make them line up. Say, for instance, that you have
featuresAwithx_index=0, y_index=1andfeaturesBwithx_index=4, y_index=2. When you addfeaturesA + featuresB, you'll need to passinto the
vector._compute.planar.add.dispatchfunction.So that's where I left the implementation, as a sketch of the idea of interpreting the
axis=-1dimension of feature arrays as vector components, passingtorchas the compute functions'lib. Considering that each of the different types of functions has to be handled differently before calling compute functions, this is not as easy as I thought (a one-day project), but it's still not a huge project. I'd also like to find out if there's a "market" for this backend: I had assumed that spatial and momentum vector calculations would be useful as (the first) part of an ML model, but I wonder if anyone has any known use-cases.Also, I have to say that the ML "vector" and "tensor" terminology is incredibly confusing in this context. When we say that a feature-set has 2D, 3D, or 4D spatial or momentum vector components, we have to be sure to not call that feature-set a "feature vector," since that's a different thing.