V1.0.1, nn.BCEWithLogitsLoss returns negative loss, Sigmoid layer not deployed

Hi Pytorch Community! I was using BCEWithLogintsLoss to train a multi-label network and getting negative loss as shown below. As mentioned in the class documentation, this loss function combines sigmoid and BCELoss…But actually as it shows. I also checked its definition in pytorch/torch/nn/functional.py and I was not able to find the sigmoid operation in this loss. Maybe I used it wrong ?

I’m using version 1.0.1

criterion = nn.BCEWithLogitsLoss()
a = torch.tensor([[1., 1., 1., 0., 0.]])
b = torch.tensor([[0., 0.0011122, 8.9638, 0., 0.]])
c = nn.Sigmoid()(b)
print(criterion(a,b))
print(criterion(a,c))

"tensor(-0.7278)"
"tensor(0.6652)"

Should line 598 of pytorch/torch/nn/moduels/loss.py be changed as below ?

        return F.binary_cross_entropy_with_logits(nn.Sigmoid()(input), target,
1 Like

This might be related to this issue. I will take a look into it

Hello Rui An!

It appears that you have switched the order of your inputs to
BCEWithLogitsLoss.

BCEWithLogitsLoss (like
binary_cross_entropy_with_logits()) expects to be
called with predictions that are logits (-infinity to infinity) and
targets that are probabilities (0 to 1), in that order.

Your a are legitimate probabilities, so they are your targets, and
your b are legitimate logits, so they are your predictions. Your
call should therefore be:

print(criterion(b,a))

(In your version of the call your second argument, b, is out of
range (not 0 to 1), so the call returns the invalid negative result.
c, however, being the result of Sigmoid, is in (0, 1), so the
result of the call is indeed positive.)

No. binary_cross_entropy_with_logits() has built
into it (implicitly, in effect) the sigmoid function (just like
BCEWithLogitsLoss), so you want to pass logits as the first
argument (input), not probabilities (nn.Sigmoid()(input)).

You shouldn’t expect to see an explicit call to Sigmoid in the
source code for BCEWithLogitsLoss. In effect Sigmoid is
applied to input, but it’s implicit (using the “log-sum-exp trick”).

Good luck.

K. Frank

1 Like

Hello KFrank, thanks a lot for a very detailed explanation! Could you also point me to the block where “log-sum-exp” is applied to ? I traced it back to functional.py but still can’t figure out how it is done…

Hi Rui An!

I don’t have the binary_cross_entropy_with_logits code
in front of me, so I can’t give you the specifics.

The issue is that floating-point error can get amplified when you
compute log of an expression containing exp. sigmoid has
the exp, and cross-entropy has the log, so you can run into
this problem when using sigmoid as input to cross-entropy.
Dealing with this issue is the main reason that
binary_cross_entropy_with_logits exists.

See, for example, the comments about “log1p” in the Wikipedia
article about logarithm.

(I was speaking loosely when I mentioned the related
“log-sum-exp-trick.” This is more directly relevant to computing
softmax, but is basically another facet of the same issue. For
more on this, see, for example, Wikipedia’s LogSumExp)

Best.

K. Frank