Manual Mean != BCEWithLogitsLoss Mean

PiXAI · October 19, 2020, 9:42am

I’m doing some experiments with MultiLabelClassification and I was trying to calculate the mean of the loss manually and found out, that there is a big difference to the BCEWithLogitsLoss-Mean and I don’t get why that’s the case.

Here is my code:

import math, torch, torch.nn as nn
pre = torch.tensor([0.133, 0.145, 0.692, 0.030])
des = torch.tensor([0., 0., 1., 0.])
torch.sigmoid(pre)
--> tensor([0.5332, 0.5362, 0.6664, 0.5075])

l1 = -(1*(math.log(0.5332))+(1-0)*math.log(1-0.5332))
--> 1.3907130692473197

l2 = -(1*(math.log(0.5362))+(1-0)*math.log(1-0.5362))
--> 1.391549907340984

l3 = -(1*(math.log(0.6664))+(1-1)*math.log(1-0.6664))
--> 0.4058651881295041

l4 = -(1*(math.log(0.5075))+(1-0)*math.log(1-0.5075))
--> 1.3865193864361882

1/4*(l1+l2+l3+l4)
--> 1.143661887788499

And this is the pytorch-implementation:

import math, torch, torch.nn as nn
criterion = nn.BCEWithLogitsLoss(reduction='mean')
pre = torch.tensor([0.133, 0.145, 0.692, 0.030])
des = torch.tensor([0., 0., 1., 0.])
criterion(pre, des)
--> tensor(0.6611)

If I set the reduction to ‘none’ only l3 (Loss 3) hast the same value:

tensor([0.7619, 0.7683, 0.4058, 0.7083])

Could somebody please explain to me, why that’s the case?
Thanks!

alexgo · October 19, 2020, 2:42pm

you have a mistake in the labels in your manual implementation.
should be:

l1 = -(0*(math.log(0.5332))+(1-0)*math.log(1-0.5332))

l2 = -(0*(math.log(0.5362))+(1-0)*math.log(1-0.5362))

l3 = -(1*(math.log(0.6664))+(1-1)*math.log(1-0.6664))

l4 = -(0*(math.log(0.5075))+(1-0)*math.log(1-0.5075))

PiXAI · October 19, 2020, 3:55pm

I really wonder sometimes where I left my brain.

Thank you! That was the mistake I made.

And now I’m getting a loss which I understand

BCEWLL:

criterion = nn.BCEWithLogitsLoss(reduction='none', pos_weight=samples_weight)
pre = torch.tensor([-8., 0., 4., -2.])
des = torch.tensor([0., 0., 1., 0.])

criterion(pre, des)
--> tensor([3.3569e-04, 6.9315e-01, 1.8150e-02, 1.2693e-01])

torch.mean(criterion(pre,des))
--> tensor(0.2096)

Manual:

pre = torch.tensor([-8., 0., 4., -2.])
des = torch.tensor([0., 0., 1., 0.])
import math, torch, torch.nn as nn
import numpy as np
def bll(pre, des):
  bitloss = 0
  for p, d in zip(pre, des):
    sp = torch.sigmoid(p)
    loss = -(d*(math.log(sp))+(1-d)*math.log(1-sp))
    bitloss += loss
    print(loss)
  return bitloss

ll = bll(pre, des)
BCEWLL = (1/len(pre))*ll
print(f"MEAN:{BCEWLL}")

--> tensor(0.0003)
tensor(0.6931)
tensor(0.0181)
tensor(0.1269)
MEAN:0.20964014530181885

The only thing I wonder is how -8 can have such a low loss value when it’s the farthest away?
According to log(1 - 1/(1 + e^8)) it should also be a negative value.

EDIT:
Well I answer my question myself. Lowest loss because it’s the farthest away and thus “very wrong” -> Not dangerous to interfere the true prediction. The 0 prediction is the nearest so the biggest loss there.

alexgo · October 19, 2020, 4:32pm

Sigmoid of -8 is a very small number so it means the prediction is correct, it is close to the label 0. So it makes sense that the loss is low.

The expression you wrote is close to log(1) which is 0.