Difference between multi label and single label CrossEntropyLoss

Hi, I have question about CrossEntropyLoss

If I understand correctly, this two should give the same output:

loss_fct = CrossEntropyLoss(reduction="none")
t = torch.randn(2, 10, 5)
l = torch.randint(0, 5, (2, 10))

loss_fct(t.transpose(-2, -1), l)
loss_fct(t.view(-1, t.size(-1)), l.view(-1)).view(t.size(0), -1)

However, while with this simple example they do, in some real loss calculations I have noticed they don’t.

Are these rounding errors of some kind or is there another possible explanation?

The second approach will fail with:

ValueError: Expected input batch_size (20) to match target batch_size (2).

since t will be flattened to [20, 5] thus increasing the batch size while l still has the shape [2, 10].

Sorry I missed a .view(-1) now it should work.

Yes, in this case it should yield the same result and the backend should also flatten the tensors in the same way as seen here.