Question about BCEWithLogitsLoss


Silly question, but when using:

> criterion = torch.nn.BCEWithLogitsLoss(size_average=True,reduce=False, reduction=None)

which gives the output of:

tensor([[[[-6.5510e-01,  3.3377e+00,  3.8248e+00,  ...,  3.8797e+00,
            7.2267e-01, -5.6749e-01],
          [ 4.6354e+00,  3.2439e+00,  2.5523e+00,  ...,  2.2153e+00,
            3.5900e+00,  2.6402e-01],
          [-9.9316e-01,  9.2173e-01, -1.6896e+00,  ...,  3.5983e+00,
            8.9532e-01, -6.6330e-01],
          [ 2.1913e+00,  2.4635e+00, -3.7076e-01,  ...,  2.8466e-01,
            1.7843e+00,  1.4812e+00],
          [-7.6295e-02,  3.0910e+00,  1.1081e+00,  ...,  3.8564e-02,
            1.6032e+00,  2.9802e-01],
          [-1.1393e+00,  2.7563e+00, -1.7027e+00,  ..., -4.7357e-01,
            2.0175e+00, -2.9758e-01]],

         [[-2.9676e+00, -9.7755e-01, -1.7805e+00,  ...,  1.2246e+00,
           -1.1476e+00,  7.8534e-01],
          [-1.6817e-01,  2.9306e+00,  5.1950e-01,  ...,  6.9879e-01,
            6.4816e-01,  2.9168e+00],
          [-1.2073e+00,  1.3979e-01, -4.0174e+00,  ...,  1.4663e+00,
           -1.5716e+00,  2.3580e+00],

If I understand the command BCEWithLogitsLoss, it uses a sigmoid function, so the $64k is why do i get values in the outut of BCE which are outside +/-1?

I realise its something i have down but not sure what?




It seems you misunderstood the BCEWithLogitsLoss. It uses sigmoid function on its inputs not on outputs. Here is pipeline: x->BCEWithLogitsLoss = x-> sigmoid -> BCELoss (Note that BCELoss is a standalone function in PyTorch too.)

If you look at the documentation of torch.nn.BCEWithLogitsLoss, it says “This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as…”.

Also, this post may help you too.

By the way, sigmoid maps input to [0,1] by default.


doh! thanks for that…

Hi chaslie!

As Doosti explained, BCEWitLogitsLoss (in effect) passes your out
tensor through a sigmoid(), not its results, so it won’t be forcing its
results to be in the range [0.0, 1.0].

Note, however, that logically your data tensor – the labels you pass
to BCEWithLogitsLoss – should be in the range [0.0, 1.0]. These
labels are normally either the discrete values 0.0 or 1.0 (and thought
of as binary class labels) or they are continuous values in [0.0, 1.0]
(and thought of as the probability of your sample being in class-“1”).

If you respect this restriction, BCEWithLogitsLoss will not return
negative loss values – its loss values will range from 0.0 to inf.
(If you do not respect this restriction, you will be calculating a loss
that doesn’t make sense, your gradients won’t make sense, and
your model won’t train properly.)

The fact that the output you get from BCEWithLogitsLoss contains
negative values means that you are breaking this rule and passing
data values outside of [0.0, 1.0]. (But this is your fault, not the
fault of BCEWithLogitsLoss.) On the other hand, the fact that the
output contains values greater than 1 is fine, and, in fact
BCEWithLogitsLoss diverges to inf for predictions that are
“completely wrong.”


K. Frank

1 Like

Hi KFrank,

you hit the nail on the head, the inut was being normalised to (-1,1), changing this solved the problem…