Just to confirm, are your really working with a â€śmulti-targetâ€ť
(multi-class) classification problem? I ask because the sample
targets you show in your .png image have either 0 (in just row 2)
or 1 (in the rest of the rows) of the labels set.

That is, even though you didnâ€™t show such a row, could you have
a row (say, row 17) for which all four fields are set to 1?

If not, you have a multi-class (but not multi-label) classification
problem, and you should recast it as such, and (most likely)
use nn.CrossEntropyLoss as your loss function.

The general consensus is that MultiLabelSoftMarginLoss
is the loss function to start with for a multi-label classification
problem.

You donâ€™t want â€“ in the typical case â€“ BCELoss for classification
problems. This is because BCELoss requires the predictions fed
into it to be numbers in (0, 1) (probability-like numbers).

For example, Animeshâ€™s output layer is a Linear

self.fc_out = nn.Linear(100, 4)

that will, in general, output numbers ranging from -infinity to
+infinity. You also donâ€™t want to pass these outputs through
a Sigmoid (or Softmax) layer to map them to (0, 1) because
of the risk of overflow.

You want, instead, a loss function that takes logit-like
predictions (that run from -infinity to +infinity), such as BCEWithLogitsLoss.

MultiLabelSoftMarginLoss and BCEWithLogitsLoss
are essentially the same function, as Peter explains here: