Loss nll - formula in documentation


I am a bit confused by the documented formula for the negative log-likelihood loss:

What are those x_(n,y_n) ? They say x is the input, but the loss is not calculated from the input. A loss should be calculated from the output and the target, should it not?


edit: is the following assumption correct?
x_(n,y_n) is the entry of the network output vector that corresponds to the probability of the target y_n, where n denotes the sample.

Hi Physics!

Yes, the terminology in the documentation is somewhat unfortunate.

The argument to the loss functions that pytorch refers to as input is
indeed the “output” of the network (or derived from it). I prefer to call
this the “prediction.” The loss function then compares the “prediction”
with the “ground truth” label, which is the argument that pytorch calls
the target.

i guess one can think of the “prediction” being the “input” to the loss
function (but so is the target).

So “output” of model = “prediction” = “input” to loss function. But using
the term input in the loss-function documentation has always seemed
a bit confusing to me, as it suggests to me the input to the model, rather
than the output of the model. But I’ve made my peace with the pytorch

Not quite. What you denote “x_(,y_n)” is the predicted log-probability
of the sample corresponding to target y_n (rather than the probability).

So, typically, the output of the final Linear layer in the model are the
predicted raw-score logits. When passed through LogSoftmax, you
get the predicted log-probabilities that you then pass into NLLLoss.
(If you pass the logits through Softmax, you get probabilities, but it’s
numerically more stable to work with the log-probabilities.)

(For convenience, pytorch’s CrossEntropyLoss combines LogSoftmax
together with NLLLoss so you can pass in the logits directly without
passing them through a separate LogSoftmax function.)


K. Frank

1 Like

Thank you very much K. Frank, for this detailed explanation!
I guess calling the network output “input” makes sense from a programmer’s perspective, since it is an input to a function.

It is all clear now.