the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).
So it’s not an error.
What happens is that starting from probs and then taking the logarithm is less numerically stable than directly taking the log as an argument.
Also, it’s the same convention as NLLLoss.