# Loss Function for predicting the probability distribution of labels

Hi, i want to design a network for a prediction task. But my prediction task is quite special. I don’t want to train my model to predict labels but the probability of those labels (my targets are probabilities of those labels , not certainties). So my question is ,which loss function should i choose for this task? Do i still use cross_entropy or do i use something else ?

Hi Nassim!

`CrossEntropyLoss` would be a perfectly appropriate – and likely the
best – loss function for this use case.

(Older versions `CrossEntropyLoss` did not support probabilistic targets,
but more recent versions, including the current stable release, 1.12.1, do.)

Assuming that `target` is a proper probability distribution, i.e.,
`0.0 <= target <= 1.0` and `target.sum() == 1.0`,
`CrossEntropyLoss() (pred, target)` will take on its minimum precisely
when the probabilities derived from `pred`, `pred.softmax()`, are equal to
`target`. (Note that in general this minimum value will not be `0.0`.)

Let’s say that `target` has shape `[nBatch, nClass]`, that is, for
each sample your `target` is a list of `nClass` label probabilities.
You would want the final layer of your network to be a `Linear` with
`out_features = nClass`. Then feed the output of that `Linear`
directly to `CrossEntropyLoss` with no intervening `softmax()` nor
other non-linearity.

Just to be clear, let’s say that `target = [0.75, 0.50, 0.25]`. Your
network will not try to learn to predict `[1.0, 0.0, 0.0]`. Instead, it
will try to learn to predict* precisely `[0.75, 0.50, 0.25]`.

*) More correctly, your network will learn to predict unnormalized
log-probabilities that yield `[0.75, 0.50, 0.25]` when passed through
`softmax()`.

Best.

K. Frank

1 Like

Hi KFrank thanks for the answer!