Confused about BCEWithLogitsLoss use (as discriminator loss)

I don’t understand what is the proper use of BCEWithLogitsLoss for binary classification. I used it as the loss function of a discriminator which outputs around 0 or 1. I removed the Sigmoid activation function.

Doing a simple experiment where I set lf = nn.BCEWithLogitsLoss(), I get
lf(res=0, target=0 ) = 0.6931
lf(res=0, target=0.5) = 0.6931
lf(res=0, target=1) = 0.6931
lf(res=0.5, target=0) = 0.9741
lf(res=0.5, target=0.5) = 0.7241
lf(res=0.5, target=1) = 0.4741
lf(res=1, target=0) = 1.3133
lf(res=1, target=0.5) = 0.8133
lf(res=1, target=1) = 0.3133

Clearly I am using the loss function wrong, because the loss is a constant 0.6931 whenever the model returns 0, and so it returns around 0 all the time.

What is a correct use case? Or relevant documentation? Or is that loss function not relevant here?

1 Like

nn.BCEWithLogitsLoss expects logits as the model output, not probabilities.
Internally, this loss function will be used (taken from the docs and just removed the weight):

-[y_n​⋅logσ(x_n​)+(1−y_n​)⋅log(1−σ(x_n​))]

If you pass x_n as 0, we can simplify the formula to:

-[y_n​⋅log(0.5)+(1−y_n​)⋅log(0.5)]

using the fact that sigmoid(0.) = 0.5. Further we’ll get:

-log(0.5) * [y_n + 1 - y_n] = -log(0.5) = 0.6931

If you would like to pass the input as probabilities, use nn.BCELoss instead.

2 Likes

I think I understand, thank you! The given target is still a probability then and the model’s output as well as the loss’ result are unbounded.