How to use BCE loss and CrossEntropyLoss correctly?


I have defined a pretrained resnet50 for data parallelism using multiple classes and use nn.CrossEntropyLoss() .

model = models.resnet50(pretrained=True)
model = torch.nn.DataParallel(model)
for p in model.parameters():
      p.requires_grad = False
num_ftrs = model.module.fc.in_features
model.module.fc = nn.Linear(num_ftrs, num_classes)
model =

However, I’m unsure of how to use BCE Loss. I have read that it is better to use nn.BCELoss() for two classes, however, I don’t know if I need to define a sigmoid layer. And if so, where to put this layer…


The docs will give you some information about these loss functions as well as small code snippets.

For a binary classification, you could either use nn.BCE(WithLogits)Loss and a single output unit or nn.CrossEntropyLoss and two outputs.
Usually nn.CrossEntropyLoss is used for a multi-class classification, but you could treat the binary classification use case as a (multi) 2-class classification, but it’s up to you which approach you would like to use.

If you are using the former approach, we generally recommend to use nn.BCEWithLogitsLoss and pass raw logits to this criterion, as it will yield better numerical stability than sigmoid + nn.BCELoss.

The latter use case also expects raw logits, which can be passed to nn.CrossEntropyLoss.


Ah ok cool.

So i’m under the assumption that softmax is automatically computed in the nn.CrossEntropyLoss module and nn.BCE(WithLogits)Loss will also have sigmoid computed within, whereas for nn.BCELoss you need to apply sigmoid first? And if so, I can send the output of sigmoid + nn.BCELoss to nn.CrossEntropyLoss?

No, this wouldn’t work, since nn.BCELoss already calculates the loss.
nn.CrossEntropyLoss expects a model output and targets, not another loss.

1 Like

Apologies, I just understood what you meant!. Many thanks for your help.