Anomaly in Binary Cross Entropy Loss for batches, and using weights to handle imbalance

aakash · April 9, 2019, 6:57pm

New to PyTorch here. trying to figure out BCELoss.,
Lets say I have a batch size of 6
and I get single 1 or 0 value as output of my network. (actually sigmoid, using 1,0 for simplicity)
If I run :

criterion    = nn.BCELoss()
prediction   = torch.tensor([[1.],[0.],[0.],[1.],[0.], [0.]])
label        = torch.tensor([[1.],[0.],[0.],[1.],[0.], [0.]])
loss         = criterion(input=prediction,target=label)
print(loss.item())

I get loss value as 0 which is perfectly fine, same if the batch size is 5 or some other value.

Now lets consider a case where only 1 prediction in a batch is off.

For 6 elements:

criterion  = nn.BCELoss()
prediction = torch.tensor([[1.],[0.],[0.],[1.],[0.], [0.]])
label      = torch.tensor([[0.],[0.],[0.],[1.],[0.], [0.]])
loss       = criterion(input=prediction,target=label)
print(loss.item())

4.605170249938965

For 5 elements:

criterion  = nn.BCELoss()
prediction = torch.tensor([[1.],[0.],[0.],[1.], [0.]])
label      = torch.tensor([[0.],[0.],[0.],[1.], [0.]])
loss       = criterion(input=prediction,target=label)
print(loss.item())

5.5262041091918945

What am I doing wrong here to get different loss values for different batch sizes ?

Also how would one use weights to handle class imbalance here?
Lets say for every n negative sample(s) I have p positive sample(s).

Thanks.

yanboliang · April 9, 2019, 10:07pm

By default, the output of BCELoss will be divided by the number of elements in the output. In your example, the first bce loss will be divided by 6 and the second one will be divided by 5, that made the different output.
You can set “reduction” to “none” or “sum” to see different output. For example:

criterion  = nn.BCELoss(reduction='none')
prediction = torch.tensor([[1.],[0.],[0.],[1.], [0.]])
label      = torch.tensor([[0.],[0.],[0.],[1.], [0.]])
loss       = criterion(input=prediction,target=label)
print(loss)

The output would be

tensor([[27.6310],
        [-0.0000],
        [-0.0000],
        [-0.0000],
        [-0.0000]])

For the 6 elements

criterion  = nn.BCELoss(reduction='none')
prediction = torch.tensor([[1.],[0.],[0.],[1.],[0.], [0.]])
label      = torch.tensor([[0.],[0.],[0.],[1.],[0.], [0.]])
loss       = criterion(input=prediction,target=label)
print(loss)

The output would be

tensor([[27.6310],
        [-0.0000],
        [-0.0000],
        [-0.0000],
        [-0.0000],
        [-0.0000]])