How to take gradients with respect to model parameters on a loss function already defined in terms of gradients of inputs?

Hello,

In my shallow view, it will work, but we should fix something manaully, see this thread which computing Hessian product and do backward, I think it is similar to you, do backward on a gradient.

So, did you have a try on the snippet above?