I am facing a problem of semantic segmentation of 2D data. I would like to apply 5 classes.
The dataset is sparsely labelled, in each image just a small portion of it is labelled, usually with just one class. It is common that in the same image more regions should have that label, and other parts of the image should have one of the other labels as well (which is not done).
I have tried naively applying CrossEntropy loss with 5 classes but without success. It seems that I should probably use weighting for every class depending on the amount of data that is annotated. Simple weight per class would not do the job here. Is there a way that I could do that with the loss functions that are already implemented in PyTorch, or should I rather look at implementing my own loss function?
Why wouldn’t using class weights (as supported, for example, by CrossEntropyLoss) work for your use case?
What specifically do you understand your problem to be?
If your dataset is so sparsely annotated that you don’t really have
enough annotated data to train with, you’re kind of stuck.
If your problem is just that your annotated data is unbalanced with
respect to how often your various classes are, in fact, labelled,
then compensating for the imbalance by using class weight is a
Thanks for getting back to me.
I think the problem which I have is twofold and class inbance which (as far as I understand) can be addressed by weighting the classes is just one of them, and probably the smaller one. I have tried adding such weights and my network dies (gives me all zeros outputs) after a few epochs. Just like without adding weights. I was thinking if there is a simple way providing a binary mask, where I have labels, which would be included in the loss function?
Let me speak as if you have images that are labelled on a per-pixel
basis, and that have only a small fraction of their pixels actually
labelled. Each pixel is either unlabelled, or is labelled with one
of five classes.
(My comments apply more generally to unlabelled / labelled data,
but it’s easier for me to speak concretely.)
I would treat this a a six-class classification problem, but use
class weights that assign a weight of zero to the “unlabelled”
class. (The five weights for the five real classes could either be
all one, or could be non-trivial weights that compensate for an
Now (because they get multiplied by zero) your unlabelled pixels
won’t contribute to your loss function – your loss function and
training won’t care what your network predicted for those
Does this make sense (and would it apply to / work for your use
When I set zero as a weight for the ‘0’ (backgroud) label, my network stopped dying after a few epochs and gives at least some results.
This is a huge step forward. Now I should probably carefully adjust the weights for the remaining classes, based on the frequency of their appearance.