From the official documentation here:
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = F.nll_loss(F.log_softmax(input), target)
I would expect the output
to be output == torch.log(torch.tensor(1 / C))
, since we would expect a random activation tensor to produce softmax inputs with random probability and hence the expected output from the negative log likelihood will be -log(1/num_classes)
. Where is my logic flawed?