Gradient question

Hello pytorch community,

Consider a neural network N_\theta(x) which outputs y. Then consider an iterative method F applied on y that outputs z.

I need to calculate a loss which is L(\theta) = |z_real - z_prediction|. However, the gradients can not flow through F as it is an iterative method (as far as my limited understanding goes).

If I would represent the gradient flow from z to x I would have:

d z/d_\theta z = d_z/d_y d_y/d_\theta

What should I do with the term d_z/d_y? Is it possible to put it to 1 somehow?

Thanks in advance.

If F is backpropagable then they will go through F.

Think that your big graph is:
y -->F_iter(y)–>z
But it can be decomposed in
y -->F(y)–>z_1–>F(z_1)–>z_2–>F(z_2)–>…–>z_n–>F(z_n)–>z
Where you have all upstream and downstream gradients.

And so Dz/Dy will be computed via chain rule as any other op in pytorch. You can in fact check it by calling your iterative function with a random vector, suming the output to obtain a scalar and then backprop it up to the random tensor and check wheter it has a gradient or not.

Unfortunatedly, F is not backpropagable in this case. I wonder:

  1. What would happen if I just set Dz/Dy just to one (in a theoretical sense)?
  2. Is it possible to do that in pytorch? Perhaps, I could just get Dy/Dtheta and then multiply by z_target-z_prediction.

Can you share a minimal reproducible error of this? It’ll be easier for people to help if they have a code snippet they can visualize and correct themselves!