I am trying to use the weights for the binary classification problem for CrossEntropyLoss and by now I am so lost in it….
In my network I set the output size as 1 and have sigmoid activation function at the end to ensure I get values between 0 and 1. I assume it is probability in my case. If output is set as 2 (for class 0 and 1) then for some reason the sum of the columns does not equal 1. This is why I set the output size as 1.
Loss function requires two columns in the output I assume for class 0 and class 1 but which order I am nor sure. I set weights as tensor([1., 5.]) assuming that I have five times more class 0 than class 1.
Thus in the training loop I have:
outputs = nn_model(X_batch)
I think it is probability for class0. Output of shape 1.
Dear K. Frank,
Thank you for your time looking into it, I switched from the BCE loss to CEL because of absence in the BCE the option to add the class weights.
I however did not yet see the BCE with Logits Loss yet.
Am I right that
here the output and target will be of the same shape?
I remove sigmoid activation at the end as it is already part of the BCEWithLogitsLoss
For reasons I don’t understand – I suppose that it’s just an inconsistency
or oversight – BCELoss lacks BCEWithLogitsLoss's pos_weight
argument. It’s not really an issue – BCEWithLogitsLoss should be
used anyways because of its better numerical stability.
I have one more follow up question. For my output I wish to get probabilities. That is why I had sigmoid function at the end of forward. Now with using BCEWithLogitsLoss function I deleted the sigmoid function and my output can be negative and can be bigger than 1 ;(
Do you know how to address this trouble? or did I completely miss something in out previous discussion?
Yes, without the Sigmoid activation function, the output of your
model will be raw-score so-called logits. They run from -inf to inf, and are what you want as the input to BCEWithLogitsLoss. Sigmoid maps logits to probabilities that run from 0.0 to 1.0.
You may well want probabilities for certain purposes. In such a case,
you still want your model to output logits (no Sigmoid) that you feed
to BCEWithLogitsLoss (and then backpropagate). When you want
probabilities, just apply Sigmoid to your logits (separately from your
model, loss function, and backpropagation) to convert them to
probabilities for whatever subsequent processing you have.
A word of explanation: The logits and probabilities contain the same
information and can be transformed mathematically back and forth
into one another. For numerical reasons it’s better to pass logits
from your model to your BCEWithLogitsLoss loss function.
Let me answer your question(s) two different ways.
You could, but doing so would be overkill. You can just call the
function (or class) version of sigmoid() directly:
my_logits = my_model (my_batch)
my_probabilities = torch.nn.functional.sigmoid (my_logits)
# or instantiate Sigmoid function obejct on the fly and call it
# my_probabilities = torch.nn.Sigmoid() (my_logits)
There is no need to wrap sigmoid() in a “mini-model” in order to
apply it to my_logits.
You generally don’t need the actual probabilities to calculate the
accuracy of your validation-set predictions. You just need to turn
the logits into binary yes/no predictions.
Also, you may want to calculate the validation-set loss, as well.
# assumes loss_criterion = BCEWithLogitsLoss (...)
with torch.no_grad(): # don't want or need gradients for validation calculations
val_logits = my_model (val_batch)
val_loss = loss_criterion (val_logits, val_targets)
# do something with val_loss
# assumes that val_targets are exactly 0.0 and 1.0
val_binary_preds = (val_logits > 0.0).float()
num_correct = (val_binary_preds == val_targets).sum()
# use num_correct to calculate average validation accuracy, etc.
Note, that a logit of zero corresponds to a probability of one half.
(sigmoid (0.0) == 0.5.) So thresholding logits against 0.0 gives
the same results as thresholding the corresponding probabilities
against 0.5, the idea being that probability > 0.5 means “yes”
(and probability <= 0.5 means “no”).