Weight in cross entropy loss

KFrank · April 24, 2020, 7:28pm

Hello Mainul!

Mainul:

But the losses are not the same.
sc=torch.tensor([0.4,0.36])
loss = nn.CrossEntropyLoss(weight=sc)
input = torch.tensor([[3.0,4.0],[6.0,9.0]])
target = torch.tensor([1,0])
output = loss(input, target)
print(output)
>>1.7529
Now for manual Calculation, first softmax the input:
print(softmax(input))
>>
tensor([[0.2689, 0.7311],
        [0.0474, 0.9526]])
((-math.log(0.7311)*0.36) - (math.log(0.0474)*0.4))/2

0.6662

What I am missing here?

When using CrossEntropyLoss (weight = sc) with class weights
to perform the default reduction = 'mean', the average loss that
is calculated is the weighted average. That is, you should be dividing
by the sum of the weights used for the samples, rather than by the
number of samples.

The following (pytorch version 0.3.0) script illustrates this:

import torch
torch.__version__

sc = torch.FloatTensor ([0.4,0.36])
loss = torch.nn.CrossEntropyLoss (weight = sc)
input = torch.autograd.Variable (torch.FloatTensor ([[3.0,4.0],[6.0,9.0]]))
target = torch.autograd.Variable (torch.LongTensor ([1,0]))
output = loss (input, target)
print (output)

probs = torch.nn.Softmax (dim = 1) (input)
output2 = -(torch.log (probs[0, 1]) * sc[1] + torch.log (probs[1, 0]) * sc[0]) / (sc[0] + sc[1])
print (output2)
print (((sc[0] + sc[1]) / 2.0) * output2)

Here is the output:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> sc = torch.FloatTensor ([0.4,0.36])
>>> loss = torch.nn.CrossEntropyLoss (weight = sc)
>>> input = torch.autograd.Variable (torch.FloatTensor ([[3.0,4.0],[6.0,9.0]]))
>>> target = torch.autograd.Variable (torch.LongTensor ([1,0]))
>>> output = loss (input, target)
>>> print (output)
Variable containing:
 1.7529
[torch.FloatTensor of size 1]

>>>
>>> probs = torch.nn.Softmax (dim = 1) (input)
>>> output2 = -(torch.log (probs[0, 1]) * sc[1] + torch.log (probs[1, 0]) * sc[0]) / (sc[0] + sc[1])
>>> print (output2)
Variable containing:
 1.7529
[torch.FloatTensor of size 1]

>>> print (((sc[0] + sc[1]) / 2.0) * output2)
Variable containing:
 0.6661
[torch.FloatTensor of size 1]

You can see that see the the (weighted) CrossEntropyLoss and
“manual” results now match. And at the end we recover your manual
result by undoing the division by the sum of the weights.

Best.

K. Frank