Hi Pytorch Community! I was using BCEWithLogintsLoss to train a multi-label network and getting negative loss as shown below. As mentioned in the class documentation, this loss function combines sigmoid and BCELoss…But actually as it shows. I also checked its definition in pytorch/torch/nn/functional.py and I was not able to find the sigmoid operation in this loss. Maybe I used it wrong ?

I’m using version 1.0.1

criterion = nn.BCEWithLogitsLoss()
a = torch.tensor([[1., 1., 1., 0., 0.]])
b = torch.tensor([[0., 0.0011122, 8.9638, 0., 0.]])
c = nn.Sigmoid()(b)
print(criterion(a,b))
print(criterion(a,c))
"tensor(-0.7278)"
"tensor(0.6652)"

Should line 598 of pytorch/torch/nn/moduels/loss.py be changed as below ?

It appears that you have switched the order of your inputs to BCEWithLogitsLoss.

BCEWithLogitsLoss (like binary_cross_entropy_with_logits()) expects to be
called with predictions that are logits (-infinity to infinity) and
targets that are probabilities (0 to 1), in that order.

Your a are legitimate probabilities, so they are your targets, and
your b are legitimate logits, so they are your predictions. Your
call should therefore be:

print(criterion(b,a))

(In your version of the call your second argument, b, is out of
range (not 0 to 1), so the call returns the invalid negative result. c, however, being the result of Sigmoid, is in (0, 1), so the
result of the call is indeed positive.)

No. binary_cross_entropy_with_logits() has built
into it (implicitly, in effect) the sigmoid function (just like BCEWithLogitsLoss), so you want to pass logits as the first
argument (input), not probabilities (nn.Sigmoid()(input)).

You shouldn’t expect to see an explicit call to Sigmoid in the
source code for BCEWithLogitsLoss. In effect Sigmoid is
applied to input, but it’s implicit (using the “log-sum-exp trick”).

Hello KFrank, thanks a lot for a very detailed explanation! Could you also point me to the block where “log-sum-exp” is applied to ? I traced it back to functional.py but still can’t figure out how it is done…

I don’t have the binary_cross_entropy_with_logits code
in front of me, so I can’t give you the specifics.

The issue is that floating-point error can get amplified when you
compute log of an expression containing exp. sigmoid has
the exp, and cross-entropy has the log, so you can run into
this problem when using sigmoid as input to cross-entropy.
Dealing with this issue is the main reason that binary_cross_entropy_with_logits exists.

See, for example, the comments about “log1p” in the Wikipedia
article about logarithm.

(I was speaking loosely when I mentioned the related
“log-sum-exp-trick.” This is more directly relevant to computing softmax, but is basically another facet of the same issue. For
more on this, see, for example, Wikipedia’s LogSumExp)