Hi Prerna and Animesh!
You don’t want – in the typical case – BCELoss
for classification
problems. This is because BCELoss
requires the predictions fed
into it to be numbers in (0, 1) (probability-like numbers).
For example, Animesh’s output layer is a Linear
self.fc_out = nn.Linear(100, 4)
that will, in general, output numbers ranging from -infinity to
+infinity. You also don’t want to pass these outputs through
a Sigmoid
(or Softmax
) layer to map them to (0, 1) because
of the risk of overflow.
You want, instead, a loss function that takes logit-like
predictions (that run from -infinity to +infinity), such as
BCEWithLogitsLoss
.
MultiLabelSoftMarginLoss
and BCEWithLogitsLoss
are essentially the same function, as Peter explains here:
Good luck.
K. Frank