Understanding the weight parameter of nn.NLLLoss()

Hi everyone,

I want to use nn.NLLLoss() to implement the Focal Loss function for a binary segmentation problem. The logits obtained from a network and binary targets (i.e., 0 for -ve class and 1 for +ve class) are used. Following cases summarizes my failed attempts (T_T)

# code for reproducibility
import torch
from torch import nn

log_prob = torch.rand((8, 1, 128, 128)).view(8*128*128, 1)      # (N,C,d1,d2, ..., dK) --> (N*d1*...*dK,C)
target = (torch.rand((8, 1, 128, 128)) > 0.5).long().view(-1)   # (N,d1,d2,...,dK) --> (N*d1*...*dK,)

Case 1: No weight OR weight for the positive class only.
I have no idea what caused this error since the target values are within [0, 1].

>> nn.NLLLoss()(log_prob, target)

OR

>> weight = torch.tensor([44.0])
>> nn.NLLLoss(weight)(log_prob, target)

"IndexError: Target 1 is out of bounds."

Case 2: Weights for the negative and positive classes.
This time tensors for the input data were kept the same and weights for both classes were passed. What I understood from the following error is nn.NLLLoss() requires weight for each pixel index (i.e., Case 3.)

>> weight = torch.tensor([1.0, 44.0])    # (-ve, +ve) class weights
>> nn.NLLLoss(weight=weight)(log_prob, target)

"RuntimeError: weight tensor should be defined either for all 1 classes or no classes but got weight tensor of shape: [2]"

Case 3: Weight per pixel.
This time the loss function again had problem with the weights.

>> weight = target * 44.0
>> nn.NLLLoss(weight=weight)(log_prob, target)

"RuntimeError: weight tensor should be defined either for all 1 classes or no classes but got weight tensor of shape: [131072]"

I will highly appreciate if you could explain how this weight parameter is designed to work. A reference to the weight or pos_weight parameter in nn.BCEWithLogitsLoss() would be a huge plus.

-----------------------------------------------------------------------

P.S. To randomly generate the binary target tensor, I first used target = torch.rand((8, 1, 128, 128), dtype=torch.bool) which caused an error RuntimeError: "check_uniform_bounds" not implemented for 'Bool' Isn’t the random generation of a bool type tensor supported?

to create random bools, use

shape = (10, 10)
random_bools = torch.rand(shape) > 0.5

nn.NLLLoss expects a model output containing log probabilities in the shape [batch_size, nb_classes, *]. If you are using a model output of [batch_size, 1] only a single class is valid (class index 0) and your use case is invalid since your model will only output a single class. Fix this and passing a weight value for each class should work.

That’s what I called Case 1: No weight OR weight for the positive class only and, strangely enough, it’s causing the error IndexError: Target 1 is out of bounds given a binary target tensor. You may use the provided reproducible code to get an insight into this.

Thanks for mentioning the log probabilities as I forgot to use nn.LogSigmoid() in my implementation. I hope preferring LogSigmoid is a better choice over LogSoftmax for a binary segmentation problem :interrobang:

Shouldn’t it be the class index 1 since the model output logits are converted into the probabilities for being a foreground pixel (denoted as class 1)?

That’s expected as explained in my previous post. If you want to use nn.NLLLoss for a binary classification the model output should have the shape [batch_size, 2].

No, since nn.NLLLoss is used for multi-class classification use cases. A better choice could be nn.BCEWithLogitsLoss as this criterion is used for binary or multi-label use cases.