Loss for Multi-Label Segmentation

Hi everybody,

I have following scenario. I have 4 classes (including background): “House”, “Door”, “Window”, “Background”. The two classes “Door” and “Window” obviously do not intersect. But both are in the class “House”.

First I subtracted the “Window” and “Door” masks from the “House” class and used a Multi-Class Segmentation approach using CrossEntropyLoss which uses Softmax, but I would like to change it to a Multi-Label application where the “Door” and “Window” pixels should also be labeled as “House”.

For that, I wanted to use BCEWithLogitsLoss which uses Sigmoid, but I don’t know how I can balance the classes. I have a list of weights that represent the average size of the classes. “House” having the smallest weight as it is the biggest class. E.g.: {'House': 1, 'Door': 20, 'Window': 25}. I passed this information in the Multi-Class scenario to the weight argument of CrossEntropyLoss.
But how do I do it with BCEWithLogitsLoss? Do I have to pass the list [1, 20, 25] to pos_weight for that?

Also, is my approach even the right way to do Multi-Label Segmentation?

Thanks in advance.

The general approach of using nn.BCEWithLogitsLoss sounds reasonable for a multi-label segmentation, where each pixel might belong to more than a single class.

You could pass the pos_weight argument, but note that it’s defined as nb_negative/nb_positive. I’m not sure, if your current weights are calculates in this way.

Thank you for the reply. Unfortunately as I can’t check the documentation of torch.binary_cross_entropy_with_logits, I couldn’t quite understand the necessary shape and form of pos_weight. I tried following, but I quite don’t know if that is the correct approach:

def bce_with_logits_loss(logit, target):
    criterion = nn.BCEWithLogitsLoss(reduction='none')
    loss = criterion(logit, target)
    loss = torch.mul(loss, pos_weight[None, :, None, None])
    loss = torch.mean(loss)
    return loss

You can see the docs here:

pos_weight ( Tensor , optional ) – a weight of positive examples. Must be a vector with length equal to the number of classes.

You can pass pos_weight as an argument to the creation of nn.BCEWithLogitsLoss instead of manually applying it.

Sorry, I was talking about not being able to see the actual implementation of torch.binary_cross_entropy_with_logits. Unfortunately, passing pos_weight as a list of weights for each class doesn’t work. Here is an example:

# Random input and target of shape (N, C, H, W)
# Inputs are logits and target is binary
input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

criterion = BCEWithLogitsLoss(pos_weight=torch.tensor([1., 1., 1.]))
loss = criterion(input, target)

"""
raises RuntimeError: The size of tensor a (3) must match the size 
of tensor b (2) at non-singleton dimension 3
"""

I also tried the channel_last variant with tensor shaped of (N, H, W, C). That works but outputs a different result than expected. It is hard to understand the behavior of pos_weight in this segmentation setting.

You can expand the weights for a segmentation use case:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
loss = criterion(input, target)

The implementation of binary_cross_entropy_with_logits can be found here.

Thank you, for the link to the implementation.
The solution you provided unfortunately doesn’t work as expected.
For example setting the last weight to 1000 like so:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
loss = criterion(input, target)

doesn’t change the loss at all.

It seems to work for me:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

weighted_criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
criterion = nn.BCEWithLogitsLoss()

weighted_loss = weighted_criterion(input, target)
loss = criterion(input, target)

print(weighted_loss)
> tensor(75.8810)
print(loss)
> tensor(0.7734)

Interesting. If I set torch.manual_seed(1) it doesn’t have an effect. If I leave it out it does something, but not what I would expect:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

weighted_criterion = nn.BCEWithLogitsLoss(reduction='none', pos_weight=weight)
criterion = nn.BCEWithLogitsLoss(reduction='none')

loss = criterion(input, target)
weighted_loss = weighted_criterion(input, target)

print(loss)
> tensor([[[[0.9330, 0.8368],
          [1.2952, 0.6551]],
         [[0.3611, 0.3950],
          [1.0079, 0.7523]],
         [[0.6751, 0.3964],
          [0.9588, 0.7661]]]])
print(weighted_loss)
> tensor([[[[0.9330, 0.8368],
          [1.2952, 0.6551]],
         [[0.3611, 0.3950],
          [1.0079, 0.7523]],
         [[675.08, 396.43],
          [0.95880, 0.76611]]]])

It only changed the upper two pixels of the last channel.