Why is CrossEntropyLoss better than BCEWithLogitsLoss for a multi-class classification problem? I heard CrossEntropyLoss was better but I can’t wrap my head around why?
nn.BCEWithLogitsLoss
is used for a binary or multi-label classification (each sample can belong to zero, one or more classes), not for a multi-class classification (each sample belongs to one class only).
You could use both criteria for a binary classification, though.
With nn.BCEWithLogitsLoss
it would be a vanilla binary classification, while nn.CrossentropyLoss
would treat the use case as a 2-class multi-class classification.
So for binary classification problem (Y in {0,1}
), if I want to use nn.BCEWithLogitsLoss
, then the output of my model should be raw scores for class=1 and have shape=(N) where N is the batch size, right?
The model should output the raw logits in the shape [batch_size, 1]
and the target should have the same shape for a binary classification.
The suggested form of [batch_size]
would most likely also work, but I would recommend to create the class dimension explicitly (just my two cents ).