Proper loss function for multilabel images with imbalanced data


I am training a modified vgg11 net on a dataset of images. I want to predict 40 labels (I use a 1D vector of length 40 with ones on the places that label is active). One image can have multiple labels (most are not exclusive, some are). Also my dataset has quite some imbalance. It is hard to balance it because there most of the times are multiple labels in the image. The final activation layer is a sigmoid function with 40 output nodes.

Which loss function is good for this purpose? I see that nn.NLLLoss gives me the possibility to weight the classes, where nn.MultiLabelMarginLoss don’t.

And when should I use MultiLabelSoftMarginLoss vs MultiLabelMarginLoss?


because it’s a multilabel classification, you should use BCEWithLogitsLoss. The output of your network should come from a Linear layer and it should go into this loss.

1 Like

Hi, Thanks for your reply. Cool, cause that was the one I was using. I only got confused when I read people went for the other options. :slight_smile:

Hey, I have a multilabel classification which is not binary, i.e. each class can have more than 2 states. (in my case- positive, negative, uncertain). Can I use BCEWithLogitsLoss for this and if not then what loss function do you suggest?

Hi Gopal!

Yes, you can. The target for BCEWithLogitsLoss can be a
probability (number between 0 and 1) – it is not required to be
a class label (e.g., a number that is either 0 or 1).

So understand your target to be the probability of the sample
being in the “positive” class. Then “positive” --> P = 1.0,
“negative” --> P = 0.0, and “uncertain” --> P = 0.5.

(If you had your training data so annotated, you could, of course,
use finer gradations, such as “leaning negative” --> P = 0.25,
and “leaning positive” --> P = 0.75, or the like.)

(In the future, it would probably help for forum search and navigation
if you could start a new thread with its own topic title for questions
like this.)


K. Frank

1 Like

Hey K. Frank,
Thank you for the quick reply.

I have a follow up query. I thought about the approach you suggest earlier and the problem with that in my particular case is I want my model to not learn wherever the labels are “uncertain”. The reason for this being- once the model is trained I hope to use it to replace the “uncertain” labels with “positive” or “negative”.

Correct me if I’m wrong but putting 0.5 would make my model biased towards examples where it can easily predict “positive” or “negative” otherwise.

Hello Gopal!

My understanding of your situation is the following:

You have annotated training data. You have samples that are
“positive” and are labeled “positive”, and similarly for “negative.”
You also have samples that are clearly “positive” or “negative”
but are labeled “uncertain” (for some reason).

If you have cases where the labels don’t match reality, I would
view those samples as not being (successfully) labeled, so I
would simply leave them out of my training data. If you don’t
want to do that (Maybe, for a given sample, some of your
multilabel labels are correctly labeled, while a few are labeled
“uncertain.”), then you can pass (per-sample, per-multilabel-label)
weights to the “function” version of BCEWithLogitsLoss, namely
binary_cross_entropy_with_logits, and use 0 for the weights of
your “uncertain” instances.

Good luck.

K. Frank

1 Like