Loss Function for predicting the probability distribution of labels

Hi, i want to design a network for a prediction task. But my prediction task is quite special. I don’t want to train my model to predict labels but the probability of those labels (my targets are probabilities of those labels , not certainties). So my question is ,which loss function should i choose for this task? Do i still use cross_entropy or do i use something else ?

Hi Nassim!

CrossEntropyLoss would be a perfectly appropriate – and likely the
best – loss function for this use case.

(Older versions CrossEntropyLoss did not support probabilistic targets,
but more recent versions, including the current stable release, 1.12.1, do.)

Assuming that target is a proper probability distribution, i.e.,
0.0 <= target <= 1.0 and target.sum() == 1.0,
CrossEntropyLoss() (pred, target) will take on its minimum precisely
when the probabilities derived from pred, pred.softmax(), are equal to
target. (Note that in general this minimum value will not be 0.0.)

Let’s say that target has shape [nBatch, nClass], that is, for
each sample your target is a list of nClass label probabilities.
You would want the final layer of your network to be a Linear with
out_features = nClass. Then feed the output of that Linear
directly to CrossEntropyLoss with no intervening softmax() nor
other non-linearity.

Just to be clear, let’s say that target = [0.75, 0.50, 0.25]. Your
network will not try to learn to predict [1.0, 0.0, 0.0]. Instead, it
will try to learn to predict* precisely [0.75, 0.50, 0.25].

*) More correctly, your network will learn to predict unnormalized
log-probabilities that yield [0.75, 0.50, 0.25] when passed through


K. Frank

1 Like

Hi KFrank thanks for the answer!