Hello Doubt -
This may be a duplicate; see below*.
The cross-entropy loss and the (negative) log-likelihood are
the same in the following sense:
If you apply Pytorch’s CrossEntropyLoss
to your output layer,
you get the same result as applying Pytorch’s NLLLoss
to a
LogSoftmax
layer added after your original output layer.
(I suspect – but don’t know for a fact – that using
CrossEntropyLoss
will be more efficient because it
can collapse some calculations together, and doesn’t
introduce an additional layer.)
You are trying to maximize the “likelihood” of your model
parameters (weights) having the right values. Maximizing
the likelihood is the same as maximizing the log-likelihood,
which is the same as minimizing the negative-log-likelihood.
For the classification problem, the cross-entropy is the
negative-log-likelihood. (The “math” definition of cross-entropy
applies to your output layer being a (discrete) probability
distribution. Pytorch’s CrossEntropyLoss
implicitly adds
a soft-max that “normalizes” your output layer into such a
probability distribution.)
Wikipedia has some explanation of the equivalence of
negative-log-likelihood and Cross entropy.
*Possible duplicate:
Best.
K. Frank