Understanding Backpropogation of the loss

Griffintaur1 · August 11, 2020, 11:57am

@ptrblck Let say I have two versions of the same image and the output of the model be I_s and I_w and the loss is some loss function.
Now I want to understand say if I use one loss(I_w.detach()_, I_s) and the other loss(I_w, I_s).
What’s the difference between the two way of computing loss.
What’s the impact of first versus second?

and Which one should be used?

albanD · August 11, 2020, 3:02pm

Hi,

The first one will prevent gradients flowing back towards I_w. So for the gradient computation, it will be as if this Tensor contained constant values that are independent of the weights.

and Which one should be used?

It depends on what you’re trying to do

Griffintaur1 · August 11, 2020, 4:00pm

@albanD I want to use consistency between I_w and I_s. with shared weights model(i.e both I_w and I_w are passed to the same model).
Loss is computed for both of these so I want my model to have consistent prediction across I_w and I_s

albanD · August 11, 2020, 6:08pm

In that case, I don’t think you want to detach as the gradients from both are important for what you want to do.

Griffintaur1 · August 12, 2020, 11:04pm

@albanD Now if I don’t use detach it will also make weights change in such a way that prediction of I_w is also close to I_s(movement towards I_s) but what I want is to make predictions of I_s to as close as I_w(movement towards I_w). on the assumption that predictions of I_w are good.

albanD · August 13, 2020, 3:05pm

Ok, So if you consider that I_w is a constant and has the right value, and you just want to make I_s get closer. Then you can detach I_w indeed.

But be aware that changes to I_s will actually impact I_w (even though you hide it to the autograd) and that can hinder the convergence.