Hi, i want to design a network for a prediction task. But my prediction task is quite special. I don’t want to train my model to predict labels but the probability of those labels (my targets are probabilities of those labels , not certainties). So my question is ,which loss function should i choose for this task? Do i still use cross_entropy or do i use something else ?
Hi Nassim!
CrossEntropyLoss
would be a perfectly appropriate – and likely the
best – loss function for this use case.
(Older versions CrossEntropyLoss
did not support probabilistic targets,
but more recent versions, including the current stable release, 1.12.1, do.)
Assuming that target
is a proper probability distribution, i.e.,
0.0 <= target <= 1.0
and target.sum() == 1.0
,
CrossEntropyLoss() (pred, target)
will take on its minimum precisely
when the probabilities derived from pred
, pred.softmax()
, are equal to
target
. (Note that in general this minimum value will not be 0.0
.)
Let’s say that target
has shape [nBatch, nClass]
, that is, for
each sample your target
is a list of nClass
label probabilities.
You would want the final layer of your network to be a Linear
with
out_features = nClass
. Then feed the output of that Linear
directly to CrossEntropyLoss
with no intervening softmax()
nor
other non-linearity.
Just to be clear, let’s say that target = [0.75, 0.50, 0.25]
. Your
network will not try to learn to predict [1.0, 0.0, 0.0]
. Instead, it
will try to learn to predict* precisely [0.75, 0.50, 0.25]
.
*) More correctly, your network will learn to predict unnormalized
log-probabilities that yield [0.75, 0.50, 0.25]
when passed through
softmax()
.
Best.
K. Frank
Hi KFrank thanks for the answer!