-
|
I'm trying to compute the NTK kernel, where the labels are k dimensional vectors. But no matter how I changed the width of the output layer, the dimension of the NTK kernel is always |D|*|D|, where D is the size of the inputs, instead of k|D| * k|D| as suggested in the line before equation (4) of the paper. Does anyone knows why the difference? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Note that Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size Finally, single-sample, finite-width NTK/NNGP (https://neural-tangents.readthedocs.io/en/latest/empirical.html), are not block-diagonal, so you can get the full |
Beta Was this translation helpful? Give feedback.
Note that
kernel_fncomputes (infinite limit of) the expectation of outputs (nngp) or Jacobians (ntk) covariance. But both outputs and Jacobians are i.i.d. along the outputchannel_axis(of size 2 in your example), hence thek|D| * k|D|covariance is constant-block diagonal along the pair ofkdimensions, and the full covariance is the Kronecker product of the kernel and the identity matrixkernel_{|D| * |D|} \otimes I_{k * k}. For this reason we only compute the non-trivial and replicated|D| * |D|kernel block.Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size
|D|, H, k, the output kernel will have shape|D|, |D|, H, H, (note that pairs of dimensions ar…