I don’t understand what is the proper use of BCEWithLogitsLoss for binary classification. I used it as the loss function of a discriminator which outputs around 0 or 1. I removed the Sigmoid activation function.
Doing a simple experiment where I set lf = nn.BCEWithLogitsLoss(), I get
lf(res=0, target=0 ) = 0.6931
lf(res=0, target=0.5) = 0.6931
lf(res=0, target=1) = 0.6931
lf(res=0.5, target=0) = 0.9741
lf(res=0.5, target=0.5) = 0.7241
lf(res=0.5, target=1) = 0.4741
lf(res=1, target=0) = 1.3133
lf(res=1, target=0.5) = 0.8133
lf(res=1, target=1) = 0.3133
Clearly I am using the loss function wrong, because the loss is a constant 0.6931 whenever the model returns 0, and so it returns around 0 all the time.
What is a correct use case? Or relevant documentation? Or is that loss function not relevant here?