Cross Entropy "sum"/N vs Cross-Entropy "mean"

mfajcik · January 24, 2019, 5:38pm

F.cross_entropy(results,labels,weight = torch.Tensor([3, 1, 9, 8]).cuda(),reduction="sum")/results.shape[0]
Out[13]: tensor(2.0683, device='cuda:0')
F.cross_entropy(results,labels,weight = torch.Tensor([3, 1, 9, 8]).cuda(),reduction="mean")
Out[14]: tensor(0.9643, device='cuda:0')

What am I missing? Why is this not equal? Its somehow connected to weighting but I did not figured out how…yet. Thanks.

vmirly1 · January 24, 2019, 6:43pm

The reduction="mean" will do average with respect to all elements, but in the other one, you are calculating the average with respect to bacth-size. So the denominator for computing the average in the first case is just batch-size (results.shape[0]), where is in the other case it is np.prod(results.shape).

vmirly1 · January 24, 2019, 11:02pm

Sure, no problem!

mfajcik · February 12, 2019, 10:07am

Actually I tried that and this is not true. The difference in my example is caused by weighting scheme. If there would be no weight parameter in the function the results are the same. The normalization should be actually be weighted by class weights.

weights = torch.Tensor([3, 1, 9, 8]).cuda()
F.cross_entropy(results,labels,weight = weights,reduction="sum")/sum([weights[k] for k in labels])

produces the same output.

vmirly1 · February 19, 2019, 2:10pm

Yes, that’s right. I had ignored the weights in my reply. So as you have pointed out with weights, it should be divided by the sum of the weights.