# Understanding NLLLoss function

``````m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 1 x 3
# each element in target has to have 0 <= value < C
target = torch.tensor([1])
output = loss(m(input), target)
print(output)

``````

I read the Documentation but it’s not clear. Can someone explain the math behind this example?

2 Likes

In your example the your output has the same “probability” for all three classes, i.e. the logits have the same value.
Their probability should therefore be approx `[0.33, 0.33, 0.33]`.
Since you are using `LogSoftmax` we can check, if this is true by calling `exp` on it (thus getting rid of the `log`):

``````print(m(input))
print(m(input).exp())
``````

You will get the same values every time you pass the same logits into `LogSoftmax`.
Now we just have to get the right index using `target`, multiply with `-1`, and end up with a loss value of `1.0986`.

9 Likes
``````loss = nn.NLLLoss()
a = torch.tensor(([0.88, 0.12], [0.51, 0.49]), dtype = torch.float)
target = torch.tensor([1, 0])
output = loss(a, target)
print(output)
``````

Don’t know if it’s right to post this question here, but I’m trying: Why the output of this piece of code is tensor(-0.3150)? I was expecting to be (-1/2) * ((1 * ln(0.88) + 0 * ln(0.12) + 1 * ln(0.51) + 0 * ln(0.49)), which would be equal to 0.4005, not -0.3150?

I found the formula for log likelihood here

1 Like

`nn.NLLLoss` expects the inputs to be log probabilities, while you are passing the probabilities into the criterion.

Also, your manual calculation seem to mix the target indices, as the first sample will have the class1 as its target and the second one class0.

Here is an example showing the same result:

``````loss = nn.NLLLoss()
a = torch.tensor(([0.88, 0.12], [0.51, 0.49]), dtype = torch.float)
target = torch.tensor([1, 0])
output = loss(torch.log(a), target)
print(output)
> tensor(1.3968)
print((-torch.log(a[0, 1]) - torch.log(a[1, 0])) / 2)
> tensor(1.3968)
``````
6 Likes

Ahh ok, thanks for the answer! What I am trying to figure out actually is how the nn.NLLLoss works for multidimensional tensors, but I couldn’t find an example. Could you give me a simple example on how that loss is calculated for a 2D or 3D tensor?

Hi Calin!

Please see (if I understand what you are asking) the description of
the “K-dimensional case” in the documentation for NLLLoss.

Here is an illustrative (pytorch 0.3.0) script:

``````import torch
torch.__version__

torch.manual_seed (2020)

nBatch = 2
nClass = 4
width = 3
height = 5
input = torch.randn (nBatch, nClass, width, height)
target = torch.multinomial (torch.ones (nClass) / nClass, nBatch * width * height, replacement = True).resize_ (nBatch, width, height)

input.shape
target.shape
target.min()
target.max()

input = torch.nn.functional.log_softmax (input, dim = 1)

torch.nn.NLLLoss() (input, target)
``````

And here is the output:

``````>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x00000170D6456630>
>>>
>>> nBatch = 2
>>> nClass = 4
>>> width = 3
>>> height = 5
>>> input = torch.randn (nBatch, nClass, width, height)
>>> target = torch.multinomial (torch.ones (nClass) / nClass, nBatch * width * height, replacement = True).resize_ (nBatch, width, height)
>>>
>>> input.shape
torch.Size([2, 4, 3, 5])
>>> target.shape
torch.Size([2, 3, 5])
>>> target.min()
0
>>> target.max()
3
>>>
>>> input = torch.nn.functional.log_softmax (input, dim = 1)
>>>
>>> torch.nn.NLLLoss() (input, target)
Variable containing:
1.9742
[torch.FloatTensor of size 1]
``````

Note that `target` has one less dimension than `input`. In particular,
target does not have an `nClass` dimension, while `input` does.

Best.

K. Frank

1 Like

Hi Frank, so I took a more simple example for trying to understand:

``````m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 2 X 2
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0])
output = loss(m(input), target)
print(m(input))
print(target)
print(output)
``````

, and one of its outputs was:

``````tensor([[-1.1722, -0.3706],
tensor([1, 0])
``````

So which is the formula involved in this example for getting the value 0.4428 based on the given input and target, because it’s not really clear for me how l1,…,ln from the l(x, y) formula (official NLLLoss documentation) are calculated since the loss weight is None?

Thanks, Calin!

Hello Calin!

`0.4428 = -(-0.3706 + -0.5150) / 2`.

That is, the value of your `output` is the average of the losses for each
of the two samples in you batch.

Quoting from the NLLLoss documentation:

weight ( Tensor , optional ) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones.

Optional means that the argument is allowed to be `None`, i.e, absent.
In such a case there is no reweighting (or, equivalently, the reweighting
factors are all equal to `1`).

Best.

K. Frank

Thank you very much, Frank!

It’s finally clear now.

I don’t know, if I could post a question here. But it’d great to find a soluton because honestly , I don’t understand what wrong here

EPOCHS = 3

for epoch in range(EPOCHS):

for data in trainset:

``````X, y = data

#output of the loss created

input = net(X.view(-1, 784)

#calculating the loss

loss = F.nll_loss(input, y)  # calc and grab the loss value

loss.backward()  # apply this loss backwards thru the network's parameters

optimizer.step()  # attempt to optimize weights to account for loss/gradients
``````

print(loss) # print loss. We hope loss (a measure of wrong-ness) declines!

But this keeps showing error at loss = F.nll_loss(input, y) and the error is SyntaxError, how do I solve it ?
I am new to deep learning, but I have experience with sklearn

What kind of error are you seeing?
Could you post the complete error message with the stack trace here, please?

Often `F.nll_loss` creates a shape mismatch error, since for a multi-class classification use case the model output is expected to contain log probabilities (applied `F.log_softmax` as the last activation function on the output) and have the shape `[batch_size, nb_classes]`. The target should be a `LongTensor` in the shape `[batch_size]` and should contain the class indices in the range `[0, nb_classes-1]`.

When I use `single label classification`, My labels are either `0` or `1`

I get output as:

My model has Linear layer as last layer.

`SequenceClassifierOutput(loss=tensor(0.3405, grad_fn=<NllLossBackward>), logits=tensor([[ 0.5105, -0.3917]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)`

How loss is calculated here from logits?

For a single binary output with labels in `[0, 1]` you should use `nn.BCEWithLogitsLoss` instead of `nn.NLLLoss`.
The output of the model should then be a single value containing the logit and should be passed directly to the criterion without applying `sigmoid` on it.
You could also use `sigmoid` and `nn.BCELoss` but the numerical stability would be worse.

But, how can I explicitly do it here. It has picked up automatically from model?

Also how it calculates to value 0.3405?

``````model = tr.XLMRobertaForSequenceClassification.from_pretrained("/home/stb/AIML/model_mlm_vocab_exp1_20epocs",problem_type="single_label_classification", num_labels=2,
ignore_mismatched_sizes=True, id2label={0: 'negative', 1: 'positive'})
``````
``````training_args = tr.TrainingArguments(
#report_to = 'wandb',
output_dir='/home/stb/AIML/results_vocab_ext_exp1',          # output directory
overwrite_output_dir = True,
num_train_epochs=10,              # total number of training epochs
per_device_train_batch_size=10,  # batch size per device during training
per_device_eval_batch_size=10,   # batch size for evaluation
learning_rate=2e-5,
warmup_steps=200,                # number of warmup steps for learning rate scheduler
weight_decay=0.01,               # strength of weight decay
logging_dir='./logs_exp1',            # directory for storing logs
logging_steps=6000,
evaluation_strategy="epoch"
,save_strategy="epoch"