# Understanding the weight parameter of nn.NLLLoss()

Hi everyone,

I want to use `nn.NLLLoss()` to implement the Focal Loss function for a binary segmentation problem. The logits obtained from a network and binary targets (i.e., 0 for -ve class and 1 for +ve class) are used. Following cases summarizes my failed attempts (T_T)

``````# code for reproducibility
import torch
from torch import nn

log_prob = torch.rand((8, 1, 128, 128)).view(8*128*128, 1)      # (N,C,d1,d2, ..., dK) --> (N*d1*...*dK,C)
target = (torch.rand((8, 1, 128, 128)) > 0.5).long().view(-1)   # (N,d1,d2,...,dK) --> (N*d1*...*dK,)
``````

Case 1: No weight OR weight for the positive class only.
I have no idea what caused this error since the target values are within [0, 1].

``````>> nn.NLLLoss()(log_prob, target)
``````

OR

``````>> weight = torch.tensor([44.0])
>> nn.NLLLoss(weight)(log_prob, target)

"IndexError: Target 1 is out of bounds."
``````

Case 2: Weights for the negative and positive classes.
This time tensors for the input data were kept the same and weights for both classes were passed. What I understood from the following error is `nn.NLLLoss()` requires weight for each pixel index (i.e., `Case 3.`)

``````>> weight = torch.tensor([1.0, 44.0])    # (-ve, +ve) class weights
>> nn.NLLLoss(weight=weight)(log_prob, target)

"RuntimeError: weight tensor should be defined either for all 1 classes or no classes but got weight tensor of shape: "
``````

Case 3: Weight per pixel.
This time the loss function again had problem with the weights.

``````>> weight = target * 44.0
>> nn.NLLLoss(weight=weight)(log_prob, target)

"RuntimeError: weight tensor should be defined either for all 1 classes or no classes but got weight tensor of shape: "
``````

I will highly appreciate if you could explain how this `weight` parameter is designed to work. A reference to the `weight` or `pos_weight` parameter in `nn.BCEWithLogitsLoss()` would be a huge plus.

# -----------------------------------------------------------------------

P.S. To randomly generate the binary target tensor, I first used `target = torch.rand((8, 1, 128, 128), dtype=torch.bool)` which caused an error `RuntimeError: "check_uniform_bounds" not implemented for 'Bool'` Isn’t the random generation of a bool type tensor supported?

to create random bools, use

``````shape = (10, 10)
random_bools = torch.rand(shape) > 0.5
``````

`nn.NLLLoss` expects a model output containing log probabilities in the shape `[batch_size, nb_classes, *]`. If you are using a model output of `[batch_size, 1]` only a single class is valid (class index 0) and your use case is invalid since your model will only output a single class. Fix this and passing a weight value for each class should work.

That’s what I called `Case 1: No weight OR weight for the positive class only` and, strangely enough, it’s causing the error `IndexError: Target 1 is out of bounds` given a binary `target` tensor. You may use the provided reproducible code to get an insight into this.

Thanks for mentioning the log probabilities as I forgot to use `nn.LogSigmoid()` in my implementation. I hope preferring `LogSigmoid` is a better choice over `LogSoftmax` for a binary segmentation problem Shouldn’t it be the `class index 1` since the model output logits are converted into the probabilities for being a foreground pixel (denoted as class 1)?

That’s expected as explained in my previous post. If you want to use `nn.NLLLoss` for a binary classification the model output should have the shape `[batch_size, 2]`.

No, since `nn.NLLLoss` is used for multi-class classification use cases. A better choice could be `nn.BCEWithLogitsLoss` as this criterion is used for binary or multi-label use cases.