I’m doing some experiments with MultiLabelClassification and I was trying to calculate the mean of the loss manually and found out, that there is a big difference to the BCEWithLogitsLoss-Mean and I don’t get why that’s the case.

pre = torch.tensor([-8., 0., 4., -2.])
des = torch.tensor([0., 0., 1., 0.])
import math, torch, torch.nn as nn
import numpy as np
def bll(pre, des):
bitloss = 0
for p, d in zip(pre, des):
sp = torch.sigmoid(p)
loss = -(d*(math.log(sp))+(1-d)*math.log(1-sp))
bitloss += loss
print(loss)
return bitloss
ll = bll(pre, des)
BCEWLL = (1/len(pre))*ll
print(f"MEAN:{BCEWLL}")
--> tensor(0.0003)
tensor(0.6931)
tensor(0.0181)
tensor(0.1269)
MEAN:0.20964014530181885

The only thing I wonder is how -8 can have such a low loss value when it’s the farthest away?
According to log(1 - 1/(1 + e^8)) it should also be a negative value.

EDIT:
Well I answer my question myself. Lowest loss because it’s the farthest away and thus “very wrong” -> Not dangerous to interfere the true prediction. The 0 prediction is the nearest so the biggest loss there.