How does PyTorch's cross-entropy loss transform logits with a soft probability target vector?

If I understand correctly, PyTorch’s nn.CrossEntropyLoss() accepts soft probability target vectors. So, for example, y (the target vector) could be [0.8, 0.2, 0.8]^T, representing an equal (0.8) probability for classes 1 and 3, and a probability of 0.2 for the second class. Obviously, this is not a normalized probability distribution, as it sums to 1.2. Now, x are the raw logits produced by the model; does nn.CrossEntropyLoss() still apply a regular (log)softmax to the logits? If so, as I understand it, the logits are transformed to a normalized probability distribution, and will never match the un-normalized soft probability target. If the (log)softmax is not applied, how does the loss function act on the raw logits instead (if at all)? Thank you in advance!

Hi Nassim!

Yes, the input consists of unnormalized log-probabilities (contrary to the documentation,
they’re not really logits) and are treated the same whether or not the “probabilities” that
make up the soft target form a properly-normalized probability distribution. (The values
in target don’t even have to be individually proper probabilities between zero and one.
They just get plugged into the cross-entropy formula without being checked or normalized.)

The documentation is silent (as far as I can tell) on this issue, but a quick test shows it to
be the case:

>>> import torch
>>> torch.__version__
'2.6.0+cu126'
>>> _ = torch.manual_seed (2025)
>>> l = torch.randn (10)
>>> t = torch.rand (10)
>>> torch.nn.CrossEntropyLoss() (l, t)
tensor(15.4764)
>>> torch.nn.CrossEntropyLoss() (l, 10 * t)
tensor(154.7643)
>>> torch.nn.CrossEntropyLoss() (l, 0.1 * t)
tensor(1.5476)
>>> torch.nn.CrossEntropyLoss() (l, -t)
tensor(-15.4764)

Best.

K. Frank

1 Like