Gradient Computation as Inference

I have a rather obscure Q I need help with (which i’ll be very surprised if someone can help me with, so no worries if it doesnt make sense).

Suppose I seek a model f_(cur_x), parameterized by 2 groups of variables theta and phi, which inherently induces correct gradients when backpropagating loss_(cur_x). Said another way, suppose I had access to target gradients, and suppose I viewed computation of the gradient (backward) as a type of inference procedure (just a forward function). Suppose this is possible due to the inclusion of params (phi) during the backward pass, which are not present in a standard forward pass.

Is there a way to frame this so that I can minimize the error btw the gradients induced by my model, and the target gradients? Said another way, its as if I need to call backward on the hypothetical output of my backward function. To be clear, I am not interested in using a hyper-network to generate gradients. I seek params (phi) which induce specific gradients for another set of params (theta) when they are included in the model. Am I simply talking about second order derivatives here?

What am I missing?

I’ve reviewed some of the posts regarding extending autograd, and custom backwards functions, but im wondering if there is an easy way to do this: frame the backward call as a forward call, which returns gradients. I should be able to measure the error btw those grads and target grads, and use that err to compute a gradient for phi, thereby minimizing the difference.

make sense, or will it not work? Any feedback would be very much appreciated. Thanks.