Gradient question

Hello pytorch community,

Consider a neural network N_\theta(x) which outputs y. Then consider an iterative method F applied on y that outputs z.

I need to calculate a loss which is L(\theta) = |z_real - z_prediction|. However, the gradients can not flow through F as it is an iterative method (as far as my limited understanding goes).

If I would represent the gradient flow from z to x I would have:

d z/d_\theta z = d_z/d_y d_y/d_\theta

What should I do with the term d_z/d_y? Is it possible to put it to 1 somehow?

Thanks in advance.

If F is backpropagable then they will go through F.

Think that your big graph is:
y -->F_iter(y)–>z
But it can be decomposed in
y -->F(y)–>z_1–>F(z_1)–>z_2–>F(z_2)–>…–>z_n–>F(z_n)–>z
Where you have all upstream and downstream gradients.

And so Dz/Dy will be computed via chain rule as any other op in pytorch. You can in fact check it by calling your iterative function with a random vector, suming the output to obtain a scalar and then backprop it up to the random tensor and check wheter it has a gradient or not.

Unfortunatedly, F is not backpropagable in this case. I wonder:

  1. What would happen if I just set Dz/Dy just to one (in a theoretical sense)?
  2. Is it possible to do that in pytorch? Perhaps, I could just get Dy/Dtheta and then multiply by z_target-z_prediction.

Can you share a minimal reproducible error of this? It’ll be easier for people to help if they have a code snippet they can visualize and correct themselves!

It should be something like this. The calculate_z function works inside a simulation and it’s not backpropagable. Therefore I would like to set d_z/d_y = 1 however I would like to still include the difference between z_real and z_prediction in the update part.

x: input
y: target 1
z: target 2 neural network to optimise

def example(self, x, y_real, z_real):
    y_predicted =
    z_predicted = self.calculate_z(y_predicted)
    self.loss = self.criterionL1(z_predicted, z_real) + self.criterionL1(y_real, y_predicted)

Any ideas?

Well, if your function isn’t differentiable you can’t learn from gradient descent. If you want to set d_z/d_y to 1 you can do that via using a torch.autograd.Function object with its forward being your calculate_z function and the backward being defined as something along the lines of grad_in = grad_output @ torch.ones_like(size)