Hello Doubt -

This may be a duplicate; see below*.

The cross-entropy loss and the (negative) log-likelihood are

the same in the following sense:

If you apply Pytorch’s `CrossEntropyLoss`

to your output layer,

you get the same result as applying Pytorch’s `NLLLoss`

to a

`LogSoftmax`

layer added after your original output layer.

(I suspect – but don’t know for a fact – that using

`CrossEntropyLoss`

will be more efficient because it

can collapse some calculations together, and doesn’t

introduce an additional layer.)

You are trying to maximize the “likelihood” of your model

parameters (weights) having the right values. Maximizing

the likelihood is the same as maximizing the log-likelihood,

which is the same as minimizing the negative-log-likelihood.

For the classification problem, the cross-entropy is the

negative-log-likelihood. (The “math” definition of cross-entropy

applies to your output layer being a (discrete) probability

distribution. Pytorch’s `CrossEntropyLoss`

implicitly adds

a soft-max that “normalizes” your output layer into such a

probability distribution.)

Wikipedia has some explanation of the equivalence of

negative-log-likelihood and Cross entropy.

*Possible duplicate:

Best.

K. Frank