Hi! I was trying to do backward on the first derivative (Jacobian). I observed that the usage of memory continues to grow if y.backward(retain_graph=True,create__graph=True) .

The conclusion from the issue you linked is that this is expected behavior mostly (or something we should forbid people from doing). torch.autograd.grad works for vectors as well. What is the issue you encounter when trying to use it?

Hi, I meet the same problem, but I want to backward on the first derivative w.r.t. network parameters. Since both torch.autograd.grad and torch.autograd.functional.jacobian only takes vector inputs while network parameters are tuple of tensors, is there a feasible way to do this? Thanks in advance for any possible help!

This post: Get gradient and Jacobian wrt the parameters helps get jacobian but I’m trying to backward further on Jacobian. It would be really nice if PyTorch supports gradient w.r.t PyTree objects like jax.