Equivalent multiplication for VJP of matrix valued function

vityou · March 18, 2023, 5:59am

If I have a function like

f = lambda x: 2 * x

and I want to calculate the VJP when it is applied to a matrix, I can do

point = torch.tensor([[1.,2.,3.], [4.,5.,6.]])
v =  torch.tensor([[1.,2.,3.], [4.,5.,6.]])
vjp(f, point, v)[1]
# tensor([[ 2.,  4.,  6.],
#         [ 8., 10., 12.]])

But what would be the equivalent product involving the size (2, 3, 2, 3) Jacobian calculated from

jac = jacobian(f, torch.tensor([[1.,2.,3.], [4.,5.,6.]]))

Because various ways of multiplying the Jacobian and vector do not give the same result as VJP

# all return rank-4 tensors
jac @ v.T
v @ jac.T
v.T @ jac
jac.T @ v

soulitzer · March 22, 2023, 8:37pm

It’s v.T @ jac, but you may need to flatten your v and Jacobian first, e.g. v.reshape(1, 6) @ jac.reshape(6, 6)