Hi, i want to design a network for a prediction task. But my prediction task is quite special. I don’t want to train my model to predict labels but the probability of those labels (my targets are probabilities of those labels , not certainties). So my question is ,which loss function should i choose for this task? Do i still use cross_entropy or do i use something else ?

Hi Nassim!

`CrossEntropyLoss`

would be a perfectly appropriate – and likely the

best – loss function for this use case.

(Older versions `CrossEntropyLoss`

did not support probabilistic targets,

but more recent versions, including the current stable release, 1.12.1, do.)

Assuming that `target`

is a proper probability distribution, i.e.,

`0.0 <= target <= 1.0`

and `target.sum() == 1.0`

,

`CrossEntropyLoss() (pred, target)`

will take on its minimum precisely

when the probabilities derived from `pred`

, `pred.softmax()`

, are equal to

`target`

. (Note that in general this minimum value will not be `0.0`

.)

Let’s say that `target`

has shape `[nBatch, nClass]`

, that is, for

each sample your `target`

is a list of `nClass`

label probabilities.

You would want the final layer of your network to be a `Linear`

with

`out_features = nClass`

. Then feed the output of that `Linear`

directly to `CrossEntropyLoss`

with no intervening `softmax()`

nor

other non-linearity.

Just to be clear, let’s say that `target = [0.75, 0.50, 0.25]`

. Your

network will *not* try to learn to predict `[1.0, 0.0, 0.0]`

. Instead, it

will try to learn to predict* precisely `[0.75, 0.50, 0.25]`

.

*) More correctly, your network will learn to predict unnormalized

log-probabilities that yield `[0.75, 0.50, 0.25]`

when passed through

`softmax()`

.

Best.

K. Frank

Hi KFrank thanks for the answer!