Weighted Cross entropy is giving the same result as cross entropy when reduction='mean'

Hi,

I was trying the weight parameter of the cross-entropy loss, and I observed that I’m achieving the same results when I’m using the default reduction parameter:

>>> import torch

>>> input = torch.randn(1, 5, requires_grad=True)
tensor([[ 1.2900,  0.4519, -0.2622, -0.5862, -0.5304]], requires_grad=True)

>>> target = torch.empty(1, dtype=torch.long).random_(5)
tensor([0])

>>> weights = torch.randn(5).abs()
tensor([1.4737, 0.8730, 0.7921, 0.2246, 1.5051])

>>> cross_entropy = torch.nn.CrossEntropyLoss()
>>> cross_entropy_weighted = torch.nn.CrossEntropyLoss(weight=weights)

>>> cross_entropy(input,target)
tensor(0.6727, grad_fn=<NllLossBackward>)

>>> cross_entropy_weighted(input,target)
tensor(0.6727, grad_fn=<NllLossBackward>)

Am I doing the weighting correctly? How is it supposed to be used the weight parameter of cross-entropy loss?

Thanks

Hi Dhorka!

The short answer is that your batch size is 1, so you won’t see a
difference.

Yes, you’re doing the weighting correctly and the result is correct.

But CrossEntropyLoss calculates a weighted average when using
class weights (and the default reduction = 'mean'). So when you
take the weighted average over a single sample, the weight drops out.

Try this again with a batch with a size of more than one (and make
sure that the batch contains samples with classes with different
weights). Then the values of weighted and unweighted loss functions
will differ.

Best.

K. Frank

Thanks for your clarification! I suppose that should be mentioned in the documentation :frowning: At this moment the documentation says:

‘mean’: the sum of the output will be divided by the number of elements in the output,

That was the source of my misunderstanding.

Hello Dhorka!

Yes, you’re right about this. Read literally, the current documentation
for CrossEntropyLoss is incorrect.

The line you cite is wrong (or, at best, imprecise or incomplete).

Also, when read strictly, this passage:

or in the case of the weight argument being specified:

{loss}(x, class) = weight[class] (…)

The losses are averaged across observations for each minibatch.

is incorrect because it doesn’t show the final reduced loss being
divided by the sum of the weights.

I believe I’ve seen this issue – whether the average for weighted
loss is divided by the sum of the weights – come up in this forum
before. (I don’t have the references off hand.)

@ptrblck, would it make sense to have the Documentation Mavens
clarify this? (Also, it looks like BCELoss and BCEWithLogitsLoss
get this wrong, while NLLLoss gets it right.)

Best.

K. Frank

Yes, I ran into this issue before and had to manually check, how the weights are used in the reduction settings.
A fix would be very welcome. Let me know, if you would be interested in creating the PR. :slight_smile: