Hi, I noticed that the output of cross-entropy loss (for semantic segmentation use case so K-dimensional one) with reduction="mean" is different than when I calculate it with sum and mean on unreduced output. This is most visible with a bigger batch size.
A minimal working example:
import torch
import torch.nn as nn
import numpy as np
basic_img = torch.Tensor([arr for arr in np.random.rand(128, 2, 768, 768)])
label_basic = torch.Tensor([arr for arr in np.random.randint(2, size=(128, 768, 768))]).long()
criterion = nn.CrossEntropyLoss(reduction='none', weight=None)
loss = criterion(basic_img, label_basic)
loss.mean()
>>> tensor(0.7137)
criterion = nn.CrossEntropyLoss(reduction='mean', weight=None)
loss = criterion(basic_img, label_basic)
loss
>>> tensor(1.6724)
As far as I understand, these two versions should be equal. Am I missing something?
When I use double precision, or the gpu, or smaller tensors, the issue
seems to go away. Naively, using reduction = 'none' and then
taking the mean() seems to give the “right” answer.
(I don’t think this can be explained away as a legitimate consequence
of floating-point round-off error.)
Yes, this looks indeed like a bug.
Thank you very much for creating this great code snippet and @Dominika_Basaj thanks a lot for reporting this issue.
I’ll create an issue on GitHub and will link it here.