From my understanding, using the BCEWithLogitsLoss should yield the same results as BCELoss composed with sigmoid units. And the only difference between the two is that the former (BCEWithLogitsLoss) is numerically more stable.
However, when I test their behavior, I get significantly different results as soon as I deal with loggits with values over 10e2.
import torch from torch import nn preds = torch.rand(10) preds = 1e2 labels = torch.zeros(10) criterion = nn.BCELoss() print(criterion(nn.Sigmoid()(preds), labels)) #outputs tensor(3.5969) criterion = nn.BCEWithLogitsLoss() print(criterion(preds, labels)) #outputs tensor(10.8338)
I am using pytorch 1.2.0.
Could someone please tell me whether I am doing anything wrong or this behavior is to be expected?
Thanks in advance!