Loss computation order changes the results

lia_b · September 15, 2020, 6:31am

I’m not sure if this is a bug.

sup_pred = model(supervised_image)
unsup_pred = model(unsupervised_image)

keeping this constant but in below changing the order:

loss_unsup = lossfn(unsup_pred, unsup_target)
loss_sup = lossfn(sup_pred, sup_target)

vs.

loss_sup = lossfn(sup_pred, sup_target)
loss_unsup = lossfn(unsup_pred, unsup_target)

if I flip these I get widely different results in accuracy during evaluation. the

opt.zero_grad()
loss.backward()
opt.step()

remain the same. why does this happen?

albanD · September 15, 2020, 2:07pm

Hi,

This is most likely some issue in your code that has side effect.
What is the loss function you’re using? Do you modify any Tensor or state inplace during that computation?
Also what about the rest of the code between the two? And how do you check that you get different results?

lia_b · September 15, 2020, 4:27pm

Thank you for your reply.

I am using cross entropy loss in both cases (nn.CrossEntropyLoss()).

I don’t modify any states of the model, I don’t update any tensors either.
My evaluation script is pretty standard. with torch.no_grad(): model.eval() …

For your reference, this is pretty much my code:

github.com

vfdev-5/FixMatch-pytorch/blob/master/main_fixmatch.py#L105


      
          y_pred_cat = model(x_cat)
          y_pred_cat = deinterleave(y_pred_cat, le)
          
          idx1 = len(x)
          idx2 = idx1 + len(weak_x)
          y_pred = y_pred_cat[:idx1, ...]
          y_weak_preds = y_pred_cat[idx1:idx2, ...]  # logits_weak
          y_strong_preds = y_pred_cat[idx2:, ...]  # logits_strong
          
          # supervised learning:
          sup_loss = sup_criterion(y_pred, y)
          
          # unsupervised learning:
          y_weak_probas = torch.softmax(y_weak_preds, dim=1).detach()
          y_pseudo = y_weak_probas.argmax(dim=1)
          max_y_weak_probas, _ = y_weak_probas.max(dim=1)
          unsup_loss_mask = (
              max_y_weak_probas >= engine.state.confidence_threshold
          ).float()
          unsup_loss = (
              unsup_criterion(y_strong_preds, y_pseudo) * unsup_loss_mask

there are a lot of functions which are not present in that script (called from elsewhere) but you can assume all losses/criteria are xent

when I flip the sup_loss with unsup_loss, results change.

albanD · September 15, 2020, 5:26pm

Do I understand correctly from your answer " My evaluation script is pretty standard. with torch.no_grad(): model.eval() …" that you compare the results if the two behave exactly the same at evaluation time?

I am afraid floating point numbers are not associative and doing things in a different order will lead to difference in the result.
While you should not see it for one iteration, when you do training, gradient descent based optimization will increase these small differences and you will most likely converge to a different point.
If you model is well behaved, this other point should have similar properties though so that’s not a problem in ML in general.

lia_b · September 15, 2020, 5:34pm

Do I understand correctly from your answer " My evaluation script is pretty standard. with torch.no_grad(): model.eval() …" that you compare the results if the two behave exactly the same at evaluation time?

Yes.

I am afraid floating point numbers are not associative and doing things in a different order will lead to difference in the result.

Good to know. Very interesting, since these where completely distinct calculations. I think as you say, my training is not that stable, if it leads to this gap.

Thank you for your help!