Confused about BCEWithLogitsLoss use (as discriminator loss)

I don’t understand what is the proper use of BCEWithLogitsLoss for binary classification. I used it as the loss function of a discriminator which outputs around 0 or 1. I removed the Sigmoid activation function.

Doing a simple experiment where I set lf = nn.BCEWithLogitsLoss(), I get
lf(res=0, target=0 ) = 0.6931
lf(res=0, target=0.5) = 0.6931
lf(res=0, target=1) = 0.6931
lf(res=0.5, target=0) = 0.9741
lf(res=0.5, target=0.5) = 0.7241
lf(res=0.5, target=1) = 0.4741
lf(res=1, target=0) = 1.3133
lf(res=1, target=0.5) = 0.8133
lf(res=1, target=1) = 0.3133

Clearly I am using the loss function wrong, because the loss is a constant 0.6931 whenever the model returns 0, and so it returns around 0 all the time.

What is a correct use case? Or relevant documentation? Or is that loss function not relevant here?

nn.BCEWithLogitsLoss expects logits as the model output, not probabilities.
Internally, this loss function will be used (taken from the docs and just removed the weight):

-[y_n​⋅logσ(x_n​)+(1−y_n​)⋅log(1−σ(x_n​))]

If you pass x_n as 0, we can simplify the formula to:

-[y_n​⋅log(0.5)+(1−y_n​)⋅log(0.5)]

using the fact that sigmoid(0.) = 0.5. Further we’ll get:

-log(0.5) * [y_n + 1 - y_n] = -log(0.5) = 0.6931

If you would like to pass the input as probabilities, use nn.BCELoss instead.

1 Like

I think I understand, thank you! The given target is still a probability then and the model’s output as well as the loss’ result are unbounded.