One good way to think about a multi-label classification problem
is to understand it as a set of binary classification problems (in
your case, five binary classification problems) that all use the same
input run through the same network. That is, for a given input, each
of your five labels is either absent or present, so your network is
making five yes-or-no (binary) predictions.
Understanding your problem as a set of binary classifications, we
see that the appropriate loss function is BCEWithLogitsLoss (or,
less optimally, BCELoss). These loss functions have support for
multi-label problems built in.
In this case I would say that three of your five predictions (one
“1” and two "0"s) were correct, and two (two "0’s that should have
been "1"s) were wrong.
This is a partial success (three out of five) but not perfect. So your
loss should not be 0 (assuming that 0 is the minimum value of your
loss function).
BCEWithLogitsLoss will, indeed, partially penalize this kind of
three-out-of-five prediction.
(Just to be clear, loss and accuracy are different things.)
I would count each correct prediction towards your overall
accuracy. If it turned out that all of your samples produced
three-out-of-five-correct predictions (an unlikely artificial
assumption), then your accuracy would be 60%.
Just a quick note about the specifics of BCEWithLogitsLoss:
Your labels (targets) are appropriately 0-or-1 binary class
labels. (These can be understood as probabilities – 0% chance
of the label being present vs. 100% chance. Furthermore, BCEWithLogitsLoss will actually accept probabilities between
0 and 1 for its targets – but you don’t have to use it this way.)
Your predictions, however, should not be 0-or-1 class labels
(nor probabilities), but rather logits – that is “raw scores” that
range from -inf to +inf. You would normally get these from
the last linear layer of your network with (in your case) five
outputs.
(You could use class labels / probabilities with BCELoss, but
using logits with BCEWithLogitsLoss is better.)
thank you for your answer, but i believe you didnt anderstand me
my specific problem is a bit different from a classic multi-label problem
i want to minimize my loss when the prediction is correct in only one class (or more)
and this is what i’m doing. a have a costume loss for my problem: