Understanding Backpropogation of the loss

@ptrblck Let say I have two versions of the same image and the output of the model be I_s and I_w and the loss is some loss function.
Now I want to understand say if I use one loss(I_w.detach()_, I_s) and the other loss(I_w, I_s).
What’s the difference between the two way of computing loss.
What’s the impact of first versus second?

and Which one should be used?

Hi,

The first one will prevent gradients flowing back towards I_w. So for the gradient computation, it will be as if this Tensor contained constant values that are independent of the weights.

and Which one should be used?

It depends on what you’re trying to do :smiley:

@albanD I want to use consistency between I_w and I_s. with shared weights model(i.e both I_w and I_w are passed to the same model).
Loss is computed for both of these so I want my model to have consistent prediction across I_w and I_s

In that case, I don’t think you want to detach as the gradients from both are important for what you want to do.

@albanD Now if I don’t use detach it will also make weights change in such a way that prediction of I_w is also close to I_s(movement towards I_s) but what I want is to make predictions of I_s to as close as I_w(movement towards I_w). on the assumption that predictions of I_w are good.

Ok, So if you consider that I_w is a constant and has the right value, and you just want to make I_s get closer. Then you can detach I_w indeed.

But be aware that changes to I_s will actually impact I_w (even though you hide it to the autograd) and that can hinder the convergence.