Suppose i have two function z = f(x), g(z), and i want to
z1 = f(x1)
z2 = f(x2)
optimize MSE(\gradient(g) (z1), \gradient(g) (z2)).
But when i backward propogate the gradient of parameter of f is all zero.
The problem is that torch.autograd.functional.jacobian() only backpropagates back to
its inputs argument (your f (x1) and f (x2)). It neither knows nor cares that, say, f (x1)
depends on theta, you don’t backpropagate through f (x1), and therefore you never reach
the dependence on theta, so you get no .grad for theta.
I’ve tweaked your script so that the call to f (x) occurs inside of batch_jacobian() and
therefore inside of the call to torch.autograd.functional.jacobian(). Doing so does then
produce .grad for theta.