What is the difference between BCELoss() and CrossEntropyLoss()?

yuanfenghuang767 · October 23, 2023, 9:02am

I set batch_size to 32, and the following are input and labels

torch.Size([32, 3, 256, 256])
tensor([0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,1, 0, 1, 1, 1, 1, 0, 0])

from torchvision.models import Swin_B_Weights
net = models.swin_b(weights=Swin_B_Weights.DEFAULT)
num_ftrs = net.head.in_features
net.head = nn.Linear(num_ftrs,2)
criterion_train = nn.BCELoss()

I try to use SwinTransformer to complete a two-classification task. When I use BCELoss, I get a **ValueError: Using a target size (torch.Size([32])) that is different to the input size (torch.Size([32, 2])) is deprecated. Please ensure they have the same size.**But the program runs normally when I use CrossEntropyLoss.So I’m confused what should I do if I want to use BCELoss?

J_Johnson · October 23, 2023, 2:35pm

For BCELoss, that is binary cross entropy, you only need 1 model output per target. This is because binary probabilities can simply be represented by one value from between 0 to 1.

Contrast that with, say, 10 possible labels in CIFAR10. There you need 10 outputs to see which the model is “guessing” as the correct answer.

By the way, for numerical stability during training, it is recommended to use the raw logits(i.e. no Sigmoid final activation) with BCEWithLogitsLoss. See here: BCEWithLogitsLoss — PyTorch 2.1 documentation