Hello pytorch community,
Consider a neural network N_\theta(x) which outputs y. Then consider an iterative method F applied on y that outputs z.
I need to calculate a loss which is L(\theta) = |z_real - z_prediction|. However, the gradients can not flow through F as it is an iterative method (as far as my limited understanding goes).
If I would represent the gradient flow from z to x I would have:
d z/d_\theta z = d_z/d_y d_y/d_\theta
What should I do with the term d_z/d_y? Is it possible to put it to 1 somehow?
Thanks in advance.
If F is backpropagable then they will go through F.
Think that your big graph is:
But it can be decomposed in
Where you have all upstream and downstream gradients.
And so Dz/Dy will be computed via chain rule as any other op in pytorch. You can in fact check it by calling your iterative function with a random vector, suming the output to obtain a scalar and then backprop it up to the random tensor and check wheter it has a gradient or not.
Unfortunatedly, F is not backpropagable in this case. I wonder:
- What would happen if I just set Dz/Dy just to one (in a theoretical sense)?
- Is it possible to do that in pytorch? Perhaps, I could just get Dy/Dtheta and then multiply by z_target-z_prediction.
Can you share a minimal reproducible error of this? It’ll be easier for people to help if they have a code snippet they can visualize and correct themselves!