Computational complexity of a hessian-vector product #32285

EmilianoG-byte · 2025-10-02T10:16:28Z

EmilianoG-byte
Oct 2, 2025

Hey there!

I have a theoretical question:

I am currently computing the first and second order approximations of a scalar-valued cost function:
$f(x_1, x_2, x_3): \mathrm{C}^{d_1} \times \mathrm{C}^{d_2} \times \mathrm{C}^{d_3} \to \mathrm{R}$.

Where $d_1 \gg d_2, d_3$

In particular, I am using jax.jvp(jax.grad(g_i), (x_i, ), (v_i, )) to compute the hessian-vector product $\partial^2 g_i(x_i) \cdot [v_i]$, where $g_i$ is defined as fixing all the other $x_j$ such that $j \neq i$.

Based on the automatic differentiation theory (e.g. The Elements of Differentiable Programming. Blondel & Roulet), I would expect to have a complexity that depends linearly on the complexity of the cost function $C$, as also mentioned in your autodiff cookbook

However, when computing the hessian-vector product for each $x_i$, I also see a significant dependence on which $x_i$ I am differentiating, as their sizes vary considerably.

Is this in agreement with the theory of automatic differentiation or some detail in the implementation? If it is related to the theory, I would greatly appreciate if you know of a source where this is discussed and can share it.

Thank you in advance!

jakevdp · 2025-10-02T12:12:53Z

jakevdp
Oct 2, 2025
Maintainer

Could you share a code snippet showing exactly what you're doing?

0 replies

EmilianoG-byte · 2025-10-03T15:36:37Z

EmilianoG-byte
Oct 3, 2025
Author

Thank you for the quick answer!

Here is a minimal example that reproduces the behaviour I am looking at:

import jax
import jax.numpy as jnp

def cost_fn(x1:jnp.ndarray, x2:jnp.ndarray, x3:jnp.ndarray)-> float:
    # some tensor contractions
    matrix = x1
    d = 20
    for _ in range(d):
        matrix = matrix @ x1
    return (x2 @ matrix @ x3).real

key = jax.random.PRNGKey(42)
key1, key2, key3, key4, key5, key6 = jax.random.split(key, 6)

# tensor dimensions
n1 = int(2*1e2)
n2 = int(1e2)
n3 = int(1e2)

# Complex matrix and vectors
x1 = jax.random.normal(key1, shape=(n1//2, n1//2)) + 1j * jax.random.normal(key2, shape=(n1//2, n1//2))
x2 = jax.random.normal(key3, shape=(n2,)) + 1j * jax.random.normal(key4, shape=(n2,))
x3 = jax.random.normal(key5, shape=(n3,)) + 1j * jax.random.normal(key6, shape=(n3,))

# Create tangent vectors for JVP calculations
v1 = jax.random.normal(key1, shape=(n1//2, n1//2)) + 1j * jax.random.normal(key2, shape=(n1//2, n1//2))
v2 = jax.random.normal(key3, shape=(n2,)) + 1j * jax.random.normal(key4, shape=(n2,))
v3 = jax.random.normal(key5, shape=(n3,)) + 1j * jax.random.normal(key6, shape=(n3,))

# Define individual functions
f_1 = lambda x: cost_fn(x, x2, x3)
grad_x1 = jax.grad(f_1)

f_2 = lambda x: cost_fn(x1, x, x3)
grad_x2 = jax.grad(f_2)

f_3 = lambda x: cost_fn(x1, x2, x)
grad_x3 = jax.grad(f_3)

From trying to reproduce this behaviour, it seems like the difference in the times given by %timeit jax.jvp(grad_xi, (xi,) , (vi,)) for each i also depends on the number of times the tensor x1 is present in the computation. Perhaps this is a more determinant factor than the dimensions themselves (?). Once again, any auto diff theory that relates to any of these parameters would be amazing too look at!

5 replies

jakevdp Oct 3, 2025
Maintainer

In cost_fun, it looks like the result of the loop over d is unused. Is that intended? Having code like this within JAX transformations that does not contribute to the function output can make the performance difficult to reason about, due to async dispatch, DCE in JIT, etc.

EmilianoG-byte Oct 3, 2025
Author

Ups, typo! I have updated it so now it should make more sense (Same behaviour after re-running it).

jakevdp Oct 3, 2025
Maintainer

So do be clear, is your question about the following benchmarks:

%timeit jax.block_until_ready(jax.jvp(grad_x1, (x1,) , (v1,)))
%timeit jax.block_until_ready(jax.jvp(grad_x2, (x2,) , (v2,)))
%timeit jax.block_until_ready(jax.jvp(grad_x3, (x3,) , (v3,)))

108 ms ± 26.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
18.9 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
22.9 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

and why the x1 JVP takes longer than the x2 or x3 JVP?

If so the answer is that the x1 JVP is more complicated: x1 participates in d operations, and each of those operations needs to be doubly differentiated. On the other hand, x2 and x3, each participate in a single operation in your function, and so differentiating with respect to them results in fewer operations in the end.

You could see the number of operations involved by using jax.make_jaxpr for each function.

EmilianoG-byte Oct 3, 2025
Author

Indeed, that’s exactly my question!

Intuitively your answer makes a lot of sense, and also from a perspective of taking the derivatives of a tensor network contraction (I’d have to take the derivative for every time the tensor appears in the contraction and sum over).

I was wondering what result from the automatic differentiation theory backs up this behaviour (I’m fairly new to this area).

As mentioned in my original question, I’ve only seen complexity results for the jvp of grad in terms of the cost function complexity, but no mention on the times the input appears in the computation graph. Perhaps I’m just not aware of how this is called technically, so any sources on this would be great :)

jakevdp Oct 3, 2025
Maintainer

From a theory perspective, f1 is nonlinear in x, so the gradient and second-order gradient will both depend on x. On the other hand, f2 and f3 are linear in x2 and x3 respectively, so the second-order gradient will be a constant.

Computational complexity of a hessian-vector product #32285

Uh oh!

Uh oh!

EmilianoG-byte Oct 2, 2025

Replies: 2 comments · 5 replies

Uh oh!

jakevdp Oct 2, 2025 Maintainer

Uh oh!

Uh oh!

EmilianoG-byte Oct 3, 2025 Author

Uh oh!

jakevdp Oct 3, 2025 Maintainer

Uh oh!

EmilianoG-byte Oct 3, 2025 Author

Uh oh!

Uh oh!

jakevdp Oct 3, 2025 Maintainer

Uh oh!

EmilianoG-byte Oct 3, 2025 Author

Uh oh!

jakevdp Oct 3, 2025 Maintainer

EmilianoG-byte
Oct 2, 2025

Replies: 2 comments 5 replies

jakevdp
Oct 2, 2025
Maintainer

EmilianoG-byte
Oct 3, 2025
Author

jakevdp Oct 3, 2025
Maintainer

EmilianoG-byte Oct 3, 2025
Author

jakevdp Oct 3, 2025
Maintainer

EmilianoG-byte Oct 3, 2025
Author

jakevdp Oct 3, 2025
Maintainer