I am training a modified vgg11 net on a dataset of images. I want to predict 40 labels (I use a 1D vector of length 40 with ones on the places that label is active). One image can have multiple labels (most are not exclusive, some are). Also my dataset has quite some imbalance. It is hard to balance it because there most of the times are multiple labels in the image. The final activation layer is a sigmoid function with 40 output nodes.
Which loss function is good for this purpose? I see that nn.NLLLoss gives me the possibility to weight the classes, where nn.MultiLabelMarginLoss don’t.
And when should I use MultiLabelSoftMarginLoss vs MultiLabelMarginLoss?
because it’s a multilabel classification, you should use BCEWithLogitsLoss. The output of your network should come from a Linear layer and it should go into this loss.
Hey, I have a multilabel classification which is not binary, i.e. each class can have more than 2 states. (in my case- positive, negative, uncertain). Can I use BCEWithLogitsLoss for this and if not then what loss function do you suggest?
Yes, you can. The target for BCEWithLogitsLoss can be a
probability (number between 0 and 1) – it is not required to be
a class label (e.g., a number that is either 0 or 1).
So understand your target to be the probability of the sample
being in the “positive” class. Then “positive” → P = 1.0,
“negative” → P = 0.0, and “uncertain” → P = 0.5.
(If you had your training data so annotated, you could, of course,
use finer gradations, such as “leaning negative” → P = 0.25,
and “leaning positive” → P = 0.75, or the like.)
(In the future, it would probably help for forum search and navigation
if you could start a new thread with its own topic title for questions
like this.)
I have a follow up query. I thought about the approach you suggest earlier and the problem with that in my particular case is I want my model to not learn wherever the labels are “uncertain”. The reason for this being- once the model is trained I hope to use it to replace the “uncertain” labels with “positive” or “negative”.
Correct me if I’m wrong but putting 0.5 would make my model biased towards examples where it can easily predict “positive” or “negative” otherwise.
My understanding of your situation is the following:
You have annotated training data. You have samples that are
“positive” and are labeled “positive”, and similarly for “negative.”
You also have samples that are clearly “positive” or “negative”
but are labeled “uncertain” (for some reason).
If you have cases where the labels don’t match reality, I would
view those samples as not being (successfully) labeled, so I
would simply leave them out of my training data. If you don’t
want to do that (Maybe, for a given sample, some of your
multilabel labels are correctly labeled, while a few are labeled
“uncertain.”), then you can pass (per-sample, per-multilabel-label)
weights to the “function” version of BCEWithLogitsLoss, namely binary_cross_entropy_with_logits, and use 0 for the weights of
your “uncertain” instances.