Expected value of nll_loss

From the official documentation here:

>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = F.nll_loss(F.log_softmax(input), target)

I would expect the output to be output == torch.log(torch.tensor(1 / C)), since we would expect a random activation tensor to produce softmax inputs with random probability and hence the expected output from the negative log likelihood will be -log(1/num_classes). Where is my logic flawed?

I think you might not be considering the “confidence” of the predictions.
I.e. even if your model is predicting random classes, the loss would be much higher, if you scale the logits as:

input = torch.randn(3, 5) * 100

However, you should get approx. the same loss if you keep the logits close to the random output:

input = torch.zeros(3, 5) + torch.randn(3, 5) * 1e-3