Segmentation loss and metrics help ignore background

Hello, I am training a model for multiclass semantic segmentation. I have 4 classes, background and 3 relevant classes. I only care about the 3 relevant classes. My data is imbalanced with a lot of background pixels. Calculating class weights it’s like (1.5, 40, 50, 30). I am sort of confused with the loss and metrics.

For loss, I am choosing between nn.CrossEntropyLoss with using class weights and Dice Loss from GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.. I’ve read that Dice Loss should automatically consider class imbalance, so there is no need to use class weights. There is a option to select classes to consider for DiceLoss. However, if I select classes (1,2,3) and ignore the background, I assume it will never label anything as the background. Are these the best losses to consider and am I using them correctly? Are there any other loss functions I should consider?

For metrics, I use Torchmetrics Jaccard/iou and Dice score Jaccard Index — PyTorch-Metrics 0.11.4 documentation. For this I am not sure whether I should include background in the iou calculation. I am wondering what is the correct/official way to calculate these metrics? If I set ignore_index to 0 (my background class), it affects the other iou scores.

Hi vg!

The lore I subscribe to concerning cross entropy vs. the Dice score
runs as follows:

Cross entropy with class weights works well with unbalanced data
unless you have quite a large number of classes. From recollection,
the papers advocating using the Dice score instead of cross entropy
predated the use of class weights (or at least made their comparisons
with cross entropy without class weights).

My intuition is that cross entropy’s logarithmic divergence for wrong
predictions is very helpful for training, and this is part of why it works
so well (and works better than the Dice score).

There are some claims that the Dice score is the better choice when
you have a large number of classes, and I have no reason to think
that this isn’t true.

Because you are working with only four classes, I would suggest that
you use cross entropy (with class weights), and only move beyond
that if you have evidence that that isn’t working well. If you do decide
to use the Dice score, I would suggest that you use it in addition to
cross entropy (to preserve the benefits of that logarithmic divergence),
rather than using it by itself. If you have enough computational power
to experiment, I would only use the Dice score by itself if you can show
that it works better than cross entropy for your use case.

Best.

K. Frank

1 Like

Thanks for the response!

When you say in addition, do you mean my loss should be Loss = CrossEntropyLoss + DiceLoss?

Also, for my second question about the metrics in my initial post, how would you recommend I set up my metrics? Should I ignore_index = 0 to ignore the background (like iou_per_class = multiclass_jaccard_index(output, target, num_classes=n_classes, average=None, ignore_index=0)? I assume I should not do that as it messes up the other iou calculations but I want to check.

Hi vg!

Yes, although I would put in a relative weighting factor for the two losses:

Loss = CrossEntropyLoss + alpha * DiceLoss

Small values of alpha mean that you would mostly be using cross entropy
with a little bit of Dice loss mixed in, while large values mean that you would
mostly be using Dice loss. If you do this, I would start with alpha = 0.0,
and then slowly increase it to see whether an admixture of Dice loss actually
helps your training and, especially, your validation / test results.

I don’t really have much to say about this.

In general, it will depend on your use case and on how you intend to use
your inference results. Is it more important that you distinguish foreground
from background, while correctly labelling the foreground classes is only
a secondary concern? Or vice versa? Would mislabelling a background
pixel be more or less bad if it is near the border of a foreground region than
if it were out in sea of pure background?

It’s certainly okay to compute some “standard” metrics, but you should make
a point of including some metrics that directly probe the intended use of
your model.

Best.

K. Frank

1 Like