Hi Saswat (and Srishti)!
It’s not clear what your use case is.
The first question, as Srishti mentioned, is whether you have a multi-label,
multi-class problem or a multi-class (single-label) problem.
Conceptually, is a given sample in exactly one class (say, class 3), but for
training purposes, predicting class 2 or 5 is still okay so you don’t want to
penalize your model too heavily?
Or, conceptually, can a given sample be in multiple classes at the same
time, and in your example your sample really is in classes 3 and 2 and 5,
and you would like to penalize your model if it doesn’t predict all three (and
also penalize it if it were additionally to predict other classes, for example,
class 4 or class 7)?
In the first case you have a single-label, multi-class problem, but with
probabilistic (“soft”) labels, and you should use CrossEntropyLoss
(and not use softmax()
). In your example your (soft) target might be
a probability of 0.7 for class 3, a probability of 0.2 for class 2, and a
probability of 0.1 for class 5 (and zero for everything else).
In the second case you have a multi-label, multi-class problem, and you
should use BCEWithLogitsLoss
(and no sigmoid()
nor softmax()
).
In this case your multi-label target might be 1.0 for classes 3, 2, and 5
(and 0.0 for all other classes). You can also use probabilities for targets
with BCEWithLogitsLoss
, e.g., perhaps, 0.9 for class 3, 0.8 for class 2,
and 0.7 for class 5 (and 0.0 or close to 0.0 for all other classes).
First start with conceptually what kind of classification problem you have
and then drill down as to what loss function you should use and how you
should structure / interpret your target data.
Best.
K. Frank