Loss computation order changes the results

I’m not sure if this is a bug.

sup_pred = model(supervised_image)
unsup_pred = model(unsupervised_image)

keeping this constant but in below changing the order:

loss_unsup = lossfn(unsup_pred, unsup_target)
loss_sup = lossfn(sup_pred, sup_target)

vs.

loss_sup = lossfn(sup_pred, sup_target)
loss_unsup = lossfn(unsup_pred, unsup_target)

if I flip these I get widely different results in accuracy during evaluation. the

opt.zero_grad()
loss.backward()
opt.step()

remain the same. why does this happen?

Hi,

This is most likely some issue in your code that has side effect.
What is the loss function you’re using? Do you modify any Tensor or state inplace during that computation?
Also what about the rest of the code between the two? And how do you check that you get different results?

Thank you for your reply.

I am using cross entropy loss in both cases (nn.CrossEntropyLoss()).

I don’t modify any states of the model, I don’t update any tensors either.
My evaluation script is pretty standard. with torch.no_grad(): model.eval() …

For your reference, this is pretty much my code:

there are a lot of functions which are not present in that script (called from elsewhere) but you can assume all losses/criteria are xent

when I flip the sup_loss with unsup_loss, results change.

Do I understand correctly from your answer " My evaluation script is pretty standard. with torch.no_grad(): model.eval() …" that you compare the results if the two behave exactly the same at evaluation time?

I am afraid floating point numbers are not associative and doing things in a different order will lead to difference in the result.
While you should not see it for one iteration, when you do training, gradient descent based optimization will increase these small differences and you will most likely converge to a different point.
If you model is well behaved, this other point should have similar properties though so that’s not a problem in ML in general.

Do I understand correctly from your answer " My evaluation script is pretty standard. with torch.no_grad(): model.eval() …" that you compare the results if the two behave exactly the same at evaluation time?

Yes.

I am afraid floating point numbers are not associative and doing things in a different order will lead to difference in the result.

Good to know. Very interesting, since these where completely distinct calculations. I think as you say, my training is not that stable, if it leads to this gap.

Thank you for your help!