I’m very confused the difference between cross-entropy loss or log likelihood loss when dealing with Multi-Class Classification (including Binary Classification) or Multi-Label Classification ?

Could you explain the difference ?

Thanks.

Hello Doubt -

This may be a duplicate; see below*.

The cross-entropy loss and the (negative) log-likelihood are

the same in the following sense:

If you apply Pytorch’s `CrossEntropyLoss`

to your output layer,

you get the same result as applying Pytorch’s `NLLLoss`

to a

`LogSoftmax`

layer added after your original output layer.

(I suspect – but don’t know for a fact – that using

`CrossEntropyLoss`

will be more efficient because it

can collapse some calculations together, and doesn’t

introduce an additional layer.)

You are trying to maximize the “likelihood” of your model

parameters (weights) having the right values. Maximizing

the likelihood is the same as maximizing the log-likelihood,

which is the same as minimizing the negative-log-likelihood.

For the classification problem, the cross-entropy is the

negative-log-likelihood. (The “math” definition of cross-entropy

applies to your output layer being a (discrete) probability

distribution. Pytorch’s `CrossEntropyLoss`

implicitly adds

a soft-max that “normalizes” your output layer into such a

probability distribution.)

Wikipedia has some explanation of the equivalence of

negative-log-likelihood and Cross entropy.

*Possible duplicate:

Best.

K. Frank

Thanks,

I have understood your words.

Hi, I have wrote a little post about KL divergence, Cross-Entropy and Negative Log-Likelihood Loss a few weeks ago: https://medium.com/@stepanulyanin/notes-on-deep-learning-theory-part-1-data-generating-process-31fdda2c8941. I hope you can find a few answers there too.

As pointed out above, conceptually negative log likelihood and cross entropy are the same. And cross entropy is a generalization of binary cross entropy if you have multiple classes and use one-hot encoding. The confusion is mostly due to the naming in PyTorch namely that it expects different input representations. While it’s conceptually the same, the way it’s used implementation-wise is something to memorize/be aware of. I made a quick cheatsheet for my students because of that: https://github.com/rasbt/stat479-deep-learning-ss19/blob/master/other/pytorch-lossfunc-cheatsheet.md