Cross Entropy "sum"/N vs Cross-Entropy "mean"

F.cross_entropy(results,labels,weight = torch.Tensor([3, 1, 9, 8]).cuda(),reduction="sum")/results.shape[0]
Out[13]: tensor(2.0683, device='cuda:0')
F.cross_entropy(results,labels,weight = torch.Tensor([3, 1, 9, 8]).cuda(),reduction="mean")
Out[14]: tensor(0.9643, device='cuda:0')

What am I missing? Why is this not equal? Its somehow connected to weighting but I did not figured out how…yet. Thanks.

The reduction="mean" will do average with respect to all elements, but in the other one, you are calculating the average with respect to bacth-size. So the denominator for computing the average in the first case is just batch-size (results.shape[0]), where is in the other case it is np.prod(results.shape).

3 Likes

Sure, no problem! :blush:

Actually I tried that and this is not true. The difference in my example is caused by weighting scheme. If there would be no weight parameter in the function the results are the same. The normalization should be actually be weighted by class weights.

weights = torch.Tensor([3, 1, 9, 8]).cuda()
F.cross_entropy(results,labels,weight = weights,reduction="sum")/sum([weights[k] for k in labels]) 

produces the same output.

Yes, that’s right. I had ignored the weights in my reply. So as you have pointed out with weights, it should be divided by the sum of the weights.