Yes I think for the above toy example, it turns out Jacobian is enough. However, for my real use-case which is for an nn.Module, it’s not enough. As the thread here suggest Get gradient and Jacobian wrt the parameters , we have to make a new function that takes the model params as the input, because Jacobian computed the gradient w.r.t. input. I will try it first. Thank you