From the official documentation here:

```
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = F.nll_loss(F.log_softmax(input), target)
```

I would expect the `output`

to be `output == torch.log(torch.tensor(1 / C))`

, since we would expect a random activation tensor to produce softmax inputs with random probability and hence the expected output from the negative log likelihood will be `-log(1/num_classes)`

. Where is my logic flawed?