Loss for Multi-Label Segmentation

oezguensi · May 30, 2020, 11:44am

Hi everybody,

I have following scenario. I have 4 classes (including background): “House”, “Door”, “Window”, “Background”. The two classes “Door” and “Window” obviously do not intersect. But both are in the class “House”.

First I subtracted the “Window” and “Door” masks from the “House” class and used a Multi-Class Segmentation approach using CrossEntropyLoss which uses Softmax, but I would like to change it to a Multi-Label application where the “Door” and “Window” pixels should also be labeled as “House”.

For that, I wanted to use BCEWithLogitsLoss which uses Sigmoid, but I don’t know how I can balance the classes. I have a list of weights that represent the average size of the classes. “House” having the smallest weight as it is the biggest class. E.g.: {'House': 1, 'Door': 20, 'Window': 25}. I passed this information in the Multi-Class scenario to the weight argument of CrossEntropyLoss.
But how do I do it with BCEWithLogitsLoss? Do I have to pass the list [1, 20, 25] to pos_weight for that?

Also, is my approach even the right way to do Multi-Label Segmentation?

Thanks in advance.

ptrblck · May 31, 2020, 9:21am

The general approach of using nn.BCEWithLogitsLoss sounds reasonable for a multi-label segmentation, where each pixel might belong to more than a single class.

You could pass the pos_weight argument, but note that it’s defined as nb_negative/nb_positive. I’m not sure, if your current weights are calculates in this way.

oezguensi · May 31, 2020, 1:30pm

Thank you for the reply. Unfortunately as I can’t check the documentation of torch.binary_cross_entropy_with_logits, I couldn’t quite understand the necessary shape and form of pos_weight. I tried following, but I quite don’t know if that is the correct approach:

def bce_with_logits_loss(logit, target):
    criterion = nn.BCEWithLogitsLoss(reduction='none')
    loss = criterion(logit, target)
    loss = torch.mul(loss, pos_weight[None, :, None, None])
    loss = torch.mean(loss)
    return loss

ptrblck · June 1, 2020, 12:23am

You can see the docs here:

pos_weight ( Tensor , optional ) – a weight of positive examples. Must be a vector with length equal to the number of classes.

You can pass pos_weight as an argument to the creation of nn.BCEWithLogitsLoss instead of manually applying it.

oezguensi · June 1, 2020, 8:54pm

Sorry, I was talking about not being able to see the actual implementation of torch.binary_cross_entropy_with_logits. Unfortunately, passing pos_weight as a list of weights for each class doesn’t work. Here is an example:

# Random input and target of shape (N, C, H, W)
# Inputs are logits and target is binary
input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

criterion = BCEWithLogitsLoss(pos_weight=torch.tensor([1., 1., 1.]))
loss = criterion(input, target)

"""
raises RuntimeError: The size of tensor a (3) must match the size 
of tensor b (2) at non-singleton dimension 3
"""

I also tried the channel_last variant with tensor shaped of (N, H, W, C). That works but outputs a different result than expected. It is hard to understand the behavior of pos_weight in this segmentation setting.

ptrblck · June 2, 2020, 5:02am

You can expand the weights for a segmentation use case:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
loss = criterion(input, target)

The implementation of binary_cross_entropy_with_logits can be found here.

oezguensi · June 2, 2020, 7:33am

Thank you, for the link to the implementation.
The solution you provided unfortunately doesn’t work as expected.
For example setting the last weight to 1000 like so:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
loss = criterion(input, target)

doesn’t change the loss at all.

ptrblck · June 2, 2020, 7:37am

It seems to work for me:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

weighted_criterion = nn.BCEWithLogitsLoss(pos_weight=weight)
criterion = nn.BCEWithLogitsLoss()

weighted_loss = weighted_criterion(input, target)
loss = criterion(input, target)

print(weighted_loss)
> tensor(75.8810)
print(loss)
> tensor(0.7734)

oezguensi · June 2, 2020, 7:43am

Interesting. If I set torch.manual_seed(1) it doesn’t have an effect. If I leave it out it does something, but not what I would expect:

input = torch.rand(1, 3, 2, 2)
target = torch.round(torch.rand(1, 3, 2, 2))

weight = torch.tensor([1., 1., 1000.]).view(1, 3, 1, 1).expand(-1, -1, 2, 2)

weighted_criterion = nn.BCEWithLogitsLoss(reduction='none', pos_weight=weight)
criterion = nn.BCEWithLogitsLoss(reduction='none')

loss = criterion(input, target)
weighted_loss = weighted_criterion(input, target)

print(loss)
> tensor([[[[0.9330, 0.8368],
          [1.2952, 0.6551]],
         [[0.3611, 0.3950],
          [1.0079, 0.7523]],
         [[0.6751, 0.3964],
          [0.9588, 0.7661]]]])
print(weighted_loss)
> tensor([[[[0.9330, 0.8368],
          [1.2952, 0.6551]],
         [[0.3611, 0.3950],
          [1.0079, 0.7523]],
         [[675.08, 396.43],
          [0.95880, 0.76611]]]])

It only changed the upper two pixels of the last channel.