I somehow got wrong result of BCELoss

I try to predict some masks using BCELoss. Something is like below:

loss = F.binary_cross_entropy(predicts, targets, reduction='elementwise_mean')

But loss is always above 0.5, then I try to use two identical masks to check BCELoss:

loss = F.binary_cross_entropy(same_mask, same_mask, reduction='elementwise_mean')

Then I got this: tensor(0.4984), which I think it should be zero.

I deal with this problem for a while and still don’t know how to solve.
Is there any suggest? That will be really helpful.
For some details, masks’ shape are like (28, 28) or (14,14), and have gone through sigmoid function.

Ok, I got it now.
Targets y should be numbers between 0 and 1. But I still can’t make my loss small…

1 Like

may I ask why you use the functional version instead of the nn version?

If you see the document, the nn version just use functional version in forward.
That’s why I think they are the same and functional version is more simple to code.

loss = F.binary_cross_entropy(same_mask, same_mask, reduction='elementwise_mean')

Then I got this: tensor(0.4984), which I think it should be zero.

Binary cross entropy is never zero except the pairs of inputs are both 1 (or very close to zero) because binary cross entropy (i.e. negative log likelihood) is “p * log(q)” . I made a plot here for a pair of inputs where input=output to illustrate this:

import torch
 
fig = plt.figure()
 
 
def bce(y_pred, y_true):
 
    y_true_v = torch.tensor([y_true]))
    y_pred_v = torch.tensor([y_pred]))
 
    return torch.nn.functional.binary_cross_entropy(y_pred_v, y_true_v, size_average=False).numpy()
 
x_axis_values = np.arange(0.01, 1.0, 0.01)
 
y = [bce(a, b) for a,b in zip(x_axis_values, x_axis_values)]
 
print(np.mean(y))
 
plt.plot(x_axis_values, y)
plt.ylabel('Cost')
plt.xlabel('Value of true label and predicted label')
plt.show()

34_pm

may I ask why you use the functional version instead of the nn version?

There is nothing against that, really. I use the functional versions all the time for things that don’t store states/paramaters. In this case, also both would produce the exact same results.

5 Likes