@ptrblck Let say I have two versions of the same image and the output of the model be I_s and I_w and the loss is some loss function.
Now I want to understand say if I use one loss(I_w.detach()_, I_s) and the other loss(I_w, I_s).
What’s the difference between the two way of computing loss.
What’s the impact of first versus second?
The first one will prevent gradients flowing back towards I_w. So for the gradient computation, it will be as if this Tensor contained constant values that are independent of the weights.
@albanD I want to use consistency between I_w and I_s. with shared weights model(i.e both I_w and I_w are passed to the same model).
Loss is computed for both of these so I want my model to have consistent prediction across I_w and I_s
@albanD Now if I don’t use detach it will also make weights change in such a way that prediction of I_w is also close to I_s(movement towards I_s) but what I want is to make predictions of I_s to as close as I_w(movement towards I_w). on the assumption that predictions of I_w are good.