Weight in cross entropy loss

Mainul · April 24, 2020, 4:34pm

I was trying to understand how weight is in CrossEntropyLoss works by a practical example. So I first run as standard PyTorch code and then manually both. But the losses are not the same.

from torch import nn
import torch
softmax=nn.Softmax()
sc=torch.tensor([0.4,0.36])
loss = nn.CrossEntropyLoss(weight=sc)
input = torch.tensor([[3.0,4.0],[6.0,9.0]])
target = torch.tensor([1,0])
output = loss(input, target)
print(output)
>>1.7529

Now for manual Calculation, first softmax the input:

print(softmax(input))
>>
tensor([[0.2689, 0.7311],
        [0.0474, 0.9526]])

and then negetive log of the correct class probality and multiply with the respective weight:
((-math.log(0.7311)*0.36) - (math.log(0.0474)*0.4))/2

0.6662

What I am missing here?

KFrank · April 24, 2020, 7:28pm

Hello Mainul!

Mainul:

But the losses are not the same.
sc=torch.tensor([0.4,0.36])
loss = nn.CrossEntropyLoss(weight=sc)
input = torch.tensor([[3.0,4.0],[6.0,9.0]])
target = torch.tensor([1,0])
output = loss(input, target)
print(output)
>>1.7529
Now for manual Calculation, first softmax the input:
print(softmax(input))
>>
tensor([[0.2689, 0.7311],
        [0.0474, 0.9526]])
((-math.log(0.7311)*0.36) - (math.log(0.0474)*0.4))/2

0.6662

What I am missing here?

When using CrossEntropyLoss (weight = sc) with class weights
to perform the default reduction = 'mean', the average loss that
is calculated is the weighted average. That is, you should be dividing
by the sum of the weights used for the samples, rather than by the
number of samples.

The following (pytorch version 0.3.0) script illustrates this:

import torch
torch.__version__

sc = torch.FloatTensor ([0.4,0.36])
loss = torch.nn.CrossEntropyLoss (weight = sc)
input = torch.autograd.Variable (torch.FloatTensor ([[3.0,4.0],[6.0,9.0]]))
target = torch.autograd.Variable (torch.LongTensor ([1,0]))
output = loss (input, target)
print (output)

probs = torch.nn.Softmax (dim = 1) (input)
output2 = -(torch.log (probs[0, 1]) * sc[1] + torch.log (probs[1, 0]) * sc[0]) / (sc[0] + sc[1])
print (output2)
print (((sc[0] + sc[1]) / 2.0) * output2)

Here is the output:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> sc = torch.FloatTensor ([0.4,0.36])
>>> loss = torch.nn.CrossEntropyLoss (weight = sc)
>>> input = torch.autograd.Variable (torch.FloatTensor ([[3.0,4.0],[6.0,9.0]]))
>>> target = torch.autograd.Variable (torch.LongTensor ([1,0]))
>>> output = loss (input, target)
>>> print (output)
Variable containing:
 1.7529
[torch.FloatTensor of size 1]

>>>
>>> probs = torch.nn.Softmax (dim = 1) (input)
>>> output2 = -(torch.log (probs[0, 1]) * sc[1] + torch.log (probs[1, 0]) * sc[0]) / (sc[0] + sc[1])
>>> print (output2)
Variable containing:
 1.7529
[torch.FloatTensor of size 1]

>>> print (((sc[0] + sc[1]) / 2.0) * output2)
Variable containing:
 0.6661
[torch.FloatTensor of size 1]

You can see that see the the (weighted) CrossEntropyLoss and
“manual” results now match. And at the end we recover your manual
result by undoing the division by the sum of the weights.

Best.

K. Frank