07 Jan 17:38

acd6d4e

v0.8.1 Latest

Latest

This patch simply adds the __all__ field that some tools (like Pylance) rely on to determine what our public API is, and to help with imports.

Changelog

Added

Added __all__ in the __init__.py of packages. This should prevent Pylance from triggering warnings when importing from torchjd.

Assets 2

13 Nov 18:54

ValerianRey

v0.8.0

5deba67

v0.8.0

🔥 Autogram: a new engine for Jacobian descent 🔥

After months of hard work, we're happy to release autogram: a new engine to compute the Gramian G = J @ J.T of the Jacobian J of the losses with respect to the model parameters.

Have you ever had memory issues while using TorchJD? Try out this new approach!

This Gramian is computed iteratively, while only having parts of J in memory at a time, so it is much more memory-efficient than computing the full Jacobian and multiplying it by its transpose.

Why does the Gramian of the Jacobian matter?

Most aggregators simply make a weighted combination of the rows of the Jacobian, whose weights depend only on the Gramian of the Jacobian. So while in standard Jacobian descent, you compute the Jacobian J and aggregate it into a vector to update the model, in Gramian-based Jacobian descent, you directly compute the Gramian of the Jacobian, and then extract weights from this Gramian, and backward the weighted combination of the losses.

This is equivalent to standard Jacobian descent, but much more memory efficient, because the Jacobian never has to be fully stored in memory. It's thus also typically much faster, especially for instance-wise risk minimization (IWRM). For more theoretical justifications, please read Section 6 of our paper.

How to make the switch?

Old engine (autojac):

from torchjd.autojac import backward
from torchjd.aggregation import UPGrad

aggregator = UPGrad()

# Repeat for several iterations
output = model(input)
losses = criterion(output, target)
optimizer.zero_grad()
backward(losses, aggregator=aggregator) 
optimizer.step()

New engine (autogram):

from torchjd.autogram import Engine
from torchjd.aggregation import UPGradWeighting

weighting = UPGradWeighting()
engine = Engine(model, batch_dim=0)  # Use batch_dim=None if not doing IWRM.

# Repeat for several iterations
output = model(input)
losses = criterion(output, target)
optimizer.zero_grad()
gramian = engine.compute_gramian(losses)
weights = weighting(gramian)
losses.backward(weights)
optimizer.step()

We're still working on making the engine even faster, but with this release you can already start using it. The interface is likely to change in the future, but adapting to these changes should always be easy!

Please open issues if you run into any problems while using it or if your have suggestions for improvements!

Changelog

Added

Added the autogram package, with the autogram.Engine. This is an implementation of Algorithm 3
from Jacobian Descent for Multi-Objective Optimization,
optimized for batched computations, as in IWRM. Generalized Gramians can also be obtained by using
the autogram engine on a tensor of losses of arbitrary shape.
For all Aggregators based on the weighting of the Gramian of the Jacobian, made their
Weighting class public. It can be used directly on a Gramian (computed via the
autogram.Engine) to extract some weights. The list of new public classes is:
- Weighting (abstract base class)
- UPGradWeighting
- AlignedMTLWeighting
- CAGradWeighting
- ConstantWeighting
- DualProjWeighting
- IMTLGWeighting
- KrumWeighting
- MeanWeighting
- MGDAWeighting
- PCGradWeighting
- RandomWeighting
- SumWeighting
Added GeneralizedWeighting (base class) and Flattening (implementation) to extract tensors of
weights from generalized Gramians.
Added usage example for IWRM with autogram.
Added usage example for IWRM with partial autogram.
Added usage example for IWMTL with autogram.
Added Python 3.14 classifier in pyproject.toml (we now also run tests on Python 3.14 in the CI).

Changed

Removed an unnecessary internal reshape when computing Jacobians. This should have no effect but a
slight performance improvement in autojac.
Revamped documentation.
Made backward and mtl_backward importable from torchjd.autojac (like it was prior to 0.7.0).
Deprecated importing backward and mtl_backward from torchjd directly.

Assets 2

0 Join discussion

04 Jun 11:22

ValerianRey

v0.7.0

54d9914

v0.7.0

⚡ Performance update ⚡

In this release, we updated torchjd to remove some of the unnecessary overhead in the internal code. This should lead to small but noticeable performance improvements (up to 10% speed).

We have also made torchjd more lightweight, by making optional some dependencies that were only used by CAGrad and NashMTL (the changelog explains how to keep installing these dependencies).

We have also fixed all internal type errors thanks to mypy, and we have added a py.typed file so mypy can be used downstream.

Changelog

Changed

BREAKING: Changed the dependencies of CAGrad and NashMTL to be optional when installing
TorchJD. Users of these aggregators will have to use pip install torchjd[cagrad], pip install torchjd[nash_mtl] or pip install torchjd[full] to install TorchJD alongside those dependencies.
This should make TorchJD more lightweight.
BREAKING: Made the aggregator modules and the autojac package protected. The aggregators
must now always be imported via their package (e.g.
from torchjd.aggregation.upgrad import UPGrad must be changed to
from torchjd.aggregation import UPGrad). The backward and mtl_backward functions must now
always be imported directly from the torchjd package (e.g.
from torchjd.autojac.mtl_backward import mtl_backward must be changed to
from torchjd import mtl_backward).
Removed the check that the input Jacobian matrix provided to an aggregator does not contain nan,
inf or -inf values. This check was costly in memory and in time for large matrices so this
should improve performance. However, if the optimization diverges for some reason (for instance
due to a too large learning rate), the resulting exceptions may come from other sources.
Removed some runtime checks on the shapes of the internal tensors used by the autojac engine.
This should lead to a small performance improvement.

Fixed

Made some aggregators (CAGrad, ConFIG, DualProj, GradDrop, IMTLG, NashMTL, PCGrad
and UPGrad) raise a NonDifferentiableError whenever one tries to differentiate through them.
Before this change, trying to differentiate through them leaded to wrong gradients or unclear
errors.

Added

Added a py.typed file in the top package of torchjd to ensure compliance with
PEP 561. This should make it possible for users to use
mypy against the type annotations provided in torchjd.

Assets 2

0 Join discussion

19 Apr 17:22

ValerianRey

v0.6.0

6f1aa9f

v0.6.0

🌱 Spring cleaning 🌷

In this release we went over the whole codebase to improve it, test it better, and fix a few issues. This should lead to some small performance improvements (in particular with the UPGrad and DualProj aggregators). This has also made us reach 100% code coverage, which should decrease the risk of introducing bugs in future updates.

Our next priority is to make torchjd faster and more memory-efficient. If you are curious about the development of torchjd, or if you want to discuss ideas involving it, feel free to join our new Discord server: https://discord.com/invite/76KkRnb3nk

Changelog

Added

Added usage example showing how to combine TorchJD with automatic mixed precision (AMP).

Changed

Refactored the underlying optimization problem that UPGrad and DualProj have to solve to
project onto the dual cone. This should slightly improve the performance and precision of these
aggregators.
Refactored internal verifications in the autojac engine so that they do not run at runtime
anymore. This should minimally improve the performance and reduce the memory usage of backward
and mtl_backward.
Refactored internal typing in the autojac engine so that fewer casts are made and so that code
is simplified. This should slightly improve the performance of backward and mtl_backward.
Improved the implementation of ConFIG to be simpler and safer when normalizing vectors. It
should slightly improve the performance of ConFIG and minimally affect its behavior.
Simplified the normalization of the Gramian in UPGrad, DualProj and CAGrad. This should
slightly improve their performance and precision.

Fixed

Fixed an issue with backward and mtl_backward that could make the ordering of the columns of
the Jacobians non-deterministic, and that could thus lead to slightly non-deterministic results
with some aggregators.
Removed arbitrary exception handling in IMTLG and AlignedMTL when the computation fails. In
practice, this fix should only affect some matrices with extremely large values, which should
not usually happen.
Fixed a bug in NashMTL that made it fail (due to a type mismatch) when update_weights_every
was more than 1.

Contributors

PierreQuinton and ValerianRey

Assets 2

01 Feb 11:58

ValerianRey

v0.5.0

194b9d8

v0.5.0

ConFIG support

This release adds the new ConFIG aggregator, recently proposed in ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks.

To import and instantiate it:

from torchjd.aggregation import ConFIG
aggregator = ConFIG()

It can then be passed to the torchjd.backward or torchjd.mtl_backward function, as usual. For more details, check out the documentation.

Thanks a lot to @qiauil (main author and maintainer of the official ConFIG repo) for his help with the integration, and to @ogencoglu for suggesting the integration of ConFIG in TorchJD!

Contributors

ogencoglu, PierreQuinton, and 2 other contributors

Assets 2

30 Jan 22:33

ValerianRey

v0.4.2

bf6a38a

v0.4.2

Official Python 3.13 support 🥳

Since PyTorch 2.6 is now out with Python 3.13 support, we added Python 3.13 to the list of Python versions tested in the CI!
This patch officializes our compatibility with Python 3.13 by changing the metadata in the pyproject.toml.

Changelog

Added

Added Python 3.13 classifier in pyproject.toml (we now also run tests on Python 3.13 in the CI).

Contributors

PierreQuinton and ValerianRey

Assets 2

02 Jan 21:41

ValerianRey

v0.4.1

7602801

v0.4.1

Bug fix

This patch fixes a bug introduced in the (yanked) v0.4.0 which could cause backward and mtl_backward to fail on some specific tensor shapes.

Changelog

Fixed

Fixed a bug introduced in v0.4.0 that could cause backward and mtl_backward to fail with some
tensor shapes.

Contributors

@ValerianRey

Contributors

ValerianRey

Assets 2

02 Jan 20:36

ValerianRey

v0.4.0

1a8454e

v0.4.0

Sequential differentiation improvements

This version provides some improvements to how backward and mtl_backward differentiate when parallel_chunk_size is such that not all tensors can be differentiated in parallel at once (for instance if parallel_chunk_size=2 but you have 3 losses).

In particular, when a single tensor has to be differentiated (e.g. when using parellel_chunk_size=1), we now avoid relying on torch.vmap, which has several issues.

The parameter retain_graph of backward and mtl_backward has also been changed to be only used during the last differentiation. In most cases, you can now simply use the default retain_graph=False (prior to this change, you had to use retain_graph=True if the differentiations were not all made in parallel at once). This should provide some improvements in terms of memory overhead.

Lastly, this update enables the usage of torchjd for training recurrent neural networks. As @lth456321 discovered, there can be an incompatibility between torch.vmap and torch.nn.RNN when running on CUDA. With this update, you can now simply set the parellel_chunk_size to 1 to avoid using torch.vmap and fix the problem. A usage example for RNNs has therefore been added to the documentation.

Changelog

Changed

Changed how the Jacobians are computed when calling backward or mtl_backward with
parallel_chunk_size=1 to not rely on torch.autograd.vmap in this case. Whenever vmap does
not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid
using vmap by calling backward or mtl_backward with parallel_chunk_size=1.
Changed the effect of the parameter retain_graph of backward and mtl_backward. When set to
False, it now frees the graph only after all gradients have been computed. In most cases, users
should now leave the default value retain_graph=False, no matter what the value of
parallel_chunk_size is. This will reduce the memory overhead.

Added

RNN training usage example in the documentation.

Contributors

PierreQuinton, ValerianRey, and lth456321

Assets 2

21 Dec 14:38

ValerianRey

v0.3.1

645629c

v0.3.1

Performance improvement patch

This patch improves the performance of the function finding the default tensors with respect to which backward and mtl_backward should differentiate. We thank @austen260 for finding the source of the performance issue and for proposing a working solution.

Changelog

Changed

Improved the performance of the graph traversal function called by backward and mtl_backward
to find the tensors with respect to which differentiation should be done. It now visits every node
at most once.

Contributors

PierreQuinton, ValerianRey, and AustenMan

Assets 2

10 Dec 21:25

ValerianRey

v0.3.0

1eaafee

v0.3.0

The interface update

This version greatly improves the interface of backward and mtl_backward, at the cost of some easy-to-fix breaking changes (some parameters of these functions have been renamed, or their order has been swapped due to becoming optional).

Downstream changes to make to keep using backward and mtl_backward:

Rename A to aggregator or pass it as a positional argument.
For backward, unless you specifically want to avoid differentiating with respect to some parameters, you can now simply use the default value of the inputs argument.
For mtl_backward, unless you want to customize which params should be updated with a step of JD and which should be updated with a step of GD, you can now simply use the default value of the shared_params and of the tasks_params arguments.
If you keep providing the inputs or the shared_params or tasks_params arguments as positional arguments, you should provide them after the aggregator.

For instance,

backward(tensors, inputs, A=aggregator)

should become

backward(tensors, aggregator)

and

mtl_backward(losses, features, tasks_params, shared_params, A=aggregator)

should become

mtl_backward(losses, features, aggregator)

We thank @raeudigerRaeffi for sharing his idea of having default values for the tensors with respect to which the differentiation should be made in backward and mtl_backward, and for implementing the first working version of the function that automatically finds these parameters from the autograd graph.

Changelog

Added

Added a default value to the inputs parameter of backward. If not provided, the inputs will
default to all leaf tensors that were used to compute the tensors parameter. This is in line
with the behavior of
torch.autograd.backward.
Added a default value to the shared_params and to the tasks_params arguments of
mtl_backward. If not provided, the shared_params will default to all leaf tensors that were
used to compute the features, and the tasks_params will default to all leaf tensors that were
used to compute each of the losses, excluding those used to compute the features.
Note in the documentation about the incompatibility of backward and mtl_backward with tensors
that retain grad.

Changed

BREAKING: Changed the name of the parameter A to aggregator in backward and
mtl_backward.
BREAKING: Changed the order of the parameters of backward and mtl_backward to make it
possible to have a default value for inputs and for shared_params and tasks_params,
respectively. Usages of backward and mtl_backward that rely on the order between arguments
must be updated.
Switched to the PEP 735 dependency groups format in
pyproject.toml (from a [tool.pdm.dev-dependencies] to a [dependency-groups] section). This
should only affect development dependencies.

Fixed

BREAKING: Added a check in mtl_backward to ensure that tasks_params and shared_params
have no overlap. Previously, the behavior in this scenario was quite arbitrary.

Contributors

PierreQuinton, ValerianRey, and raeudigerRaeffi

Assets 2

Releases: TorchJD/torchjd

v0.8.1

Changelog

Added

Uh oh!

v0.8.0

🔥 Autogram: a new engine for Jacobian descent 🔥

Have you ever had memory issues while using TorchJD? Try out this new approach!

Why does the Gramian of the Jacobian matter?

How to make the switch?

Changelog

Added

Changed

Uh oh!

v0.7.0

⚡ Performance update ⚡

Changelog

Changed

Fixed

Added

Uh oh!

v0.6.0

🌱 Spring cleaning 🌷

Changelog

Added

Changed

Fixed

Contributors

Contributors

Uh oh!

v0.5.0

ConFIG support

Contributors

Contributors

Uh oh!

v0.4.2

Official Python 3.13 support 🥳

Changelog

Added

Contributors

Contributors

Uh oh!

v0.4.1

Bug fix

Changelog

Fixed

Contributors

Contributors

Uh oh!

v0.4.0

Sequential differentiation improvements

Changelog

Changed

Added

Contributors

Contributors

Uh oh!

v0.3.1

Performance improvement patch

Changelog

Changed

Contributors

Contributors

Uh oh!

v0.3.0

The interface update

Changelog

Added

Changed

Fixed

Contributors

Contributors

Uh oh!