Use CrossEntropyLoss() in multiclass semantic segmentation

I would like to know how to properly use CrossEntropyLoss() for the multiclass semantic segmentation task.

This is an implementation of FocalLoss, where I substituted the original binary_cross_entropy() with CrossEntropyLoss()

class FocalLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(FocalLoss, self).__init__()

    def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
        #inputs = torch.squeeze(inputs, 1)
        inputs = torch.as_tensor(inputs, dtype=torch.float32, device=torch.device('cuda'))
        targets = torch.as_tensor(targets, dtype=torch.float32, device=torch.device('cuda'))
        #comment out if your model contains a sigmoid or equivalent activation layer
        #inputs = torch.sigmoid(inputs)       
        #flatten label and prediction tensors
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        #first compute binary cross-entropy 
        #BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        BCE = nn.CrossEntropyLoss(inputs, targets, reduction='mean')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = alpha * (1- BCE_EXP)**gamma * BCE
        return focal_loss

When I run the above script I get this error:

/usr/local/lib/python3.7/dist-packages/torch/nn/ in legacy_get_string(size_average, reduce, emit_warning)
     33         reduce = True
---> 35     if size_average and reduce:
     36         ret = 'mean'
     37     elif reduce:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous
  • Should I change the target’s shape from (BxCxWxH) to (BxWxH)?
  • Is it necessary for the target to be one-hot-encoded?

Hi Simone!

Your problem is that you are instantiating a CrossEntropyLoss object
with garbled constructor arguments. I would recommend using the
functional form (as you had been doing with binary_cross_entropy()):

        BCE = F.cross_entropy (inputs, targets, reduction='mean')

You could instantiate CrossEntropyLoss on the fly and then call it:

BCE = nn.CrossEntropyLoss (reduction = 'mean') (inputs, targets)

but, stylistically, I prefer the functional form.

If your inputs (your predictions, i.e., the output of your model) has shape
[B, C, W, H], then your targets (which I assume you mean by “masks”)
should have shape [B, W, H] (with no class dimension). The values of
targets should be long class labels that run from 0 to C - 1, where C
is the number of classes.

Also, your inputs should be raw-score logits, that is, the output of your
model should be the output of its final Linear layer – not followed by
softmax() (or sigmoid()).

Last, as an aside, unless you have a very large number of classes,
consider using regular cross entropy as your loss criterion, using class
weights if you have a significant class imbalance in your data. There are
also claims that you are likely to get better results using a focal-loss term
as an add-on to cross-entropy compared to using focal loss alone.


K. Frank

1 Like

@KFrank Thanks for the advice!
Would you suggest using cross-entropy with weights instead of focal-loss?

Hi Simone!

I would certainly suggest at least trying cross entropy. If you have a
significant class imbalance, then try using CrossEntropyLoss’s weight
constructor argument.

Focal loss is a perfectly reasonable loss criterion, and could well work for
you, especially if you very large number of classes. (How many classes do
you have?) But I would only use it if you can show that it gives you better
results than cross entropy for your specific use case.


K. Frank

1 Like

I have only 3 classes, therefore I will try CrossEntropyLoss as suggested.

Furthermore, I would try dice loss, but I am not sure how it could work with a significant class imbalance as in my case.