Better for binary classification

Why is CrossEntropyLoss better than BCEWithLogitsLoss for a multi-class classification problem? I heard CrossEntropyLoss was better but I can’t wrap my head around why?

nn.BCEWithLogitsLoss is used for a binary or multi-label classification (each sample can belong to zero, one or more classes), not for a multi-class classification (each sample belongs to one class only).

You could use both criteria for a binary classification, though.
With nn.BCEWithLogitsLoss it would be a vanilla binary classification, while nn.CrossentropyLoss would treat the use case as a 2-class multi-class classification.

So for binary classification problem (Y in {0,1}), if I want to use nn.BCEWithLogitsLoss, then the output of my model should be raw scores for class=1 and have shape=(N) where N is the batch size, right?

The model should output the raw logits in the shape [batch_size, 1] and the target should have the same shape for a binary classification.
The suggested form of [batch_size] would most likely also work, but I would recommend to create the class dimension explicitly (just my two cents :wink: ).

1 Like