One user posted his code for an R-op here, but with no explanation of how to use it. Using the approach described in this blog I can calculate a Jacobian-vector product for a single vector. What I’d like to be able to do is to calculate it for an entire batch of vectors (some matrix). Here is the code I’m using for the single vector case:
mlp = MLP()
batch_size = 1
x = torch.randn(batch_size, 25)
v = torch.randn(batch_size, mlp.n_params())
y = mlp(x)
u = torch.ones_like(y, requires_grad=True)
vjp = torch.autograd.grad(y, mlp.parameters(), grad_outputs=u, create_graph=True)
vjp_flat = torch.cat([grad.view(-1) for grad in vjp])
jvp = torch.autograd.grad(vjp_flat, u, grad_outputs=v, retain_graph=False)
I’ve tried making u and v matrices (making the first dimension larger than 1) and the results are incorrect. I’m validating the jvp by checking it against a Jacobian obtained by sequentially calling backward on every element of the network output.
Does anyone know how I can accomplish this? I run into needing to compute this quantity quite a bit.