Which is the right loss?

Jordan_Howell · November 15, 2019, 10:52pm

Should i use softmax on the last layer with Cross entropy loss for a binary classification? Is there a cheat sheet out there on how to pair up the last layer and loss criteria?

ptrblck · November 15, 2019, 10:54pm

The docs for the loss functions (e.g. nn.CrossEntropyLoss) provide the necessary information on how to pass the inputs and targets.
nn.CrossEntropyLoss expects logits, so you shouldn’t apply a softmax on your model outputs.

Jordan_Howell · November 15, 2019, 11:39pm

So I don’t see a ‘logits’ in the docs. I just let the linear layers run when constructing the model class and apply the loss to whatever the final layer puts out?

ptrblck · November 15, 2019, 11:41pm

Yes, just pass the output of your last (linear) layer directly to nn.CrossEntropyLoss, as internally F.log_softmax and nn.NLLLoss will be applied.

In the docs “scores” is used, so you are right about the missing “logits”.

The input is expected to contain raw, unnormalized scores for each class.

Jordan_Howell · November 15, 2019, 11:44pm

That’s awesome. Thanks.

Jordan_Howell · November 16, 2019, 12:32am

Hi @ptrblck. I expect when i take the exponent of the predicted values after sending the data through, they would equal 1 after being summed. I don’t get that. Am I missing something?

ptrblck · November 16, 2019, 12:47am

If you apply F/log_softmax manually, the exponent of this output should sum to one.
Which output are you using at the moment? The model output or the (unreduced) loss function output?

Jordan_Howell · November 16, 2019, 12:51am

the out put is just the output layer by itself. I will switch to log_softmax(x)?

ptrblck · November 16, 2019, 12:53am

In that case, you cannot expect the exponent of logits to sum to one.
To get the probabilities, you could use softmax on the output of the last layer.
Just make sure to not pass the softmax’ed output to nn.CrossentropyLoss.

Jordan_Howell · November 16, 2019, 1:06am

So i use Log_softmax() on output with nn.CrossentropyLoss? Or log_softmax() with NLLloss()

ptrblck · November 16, 2019, 1:22am

You can use:

raw logits (no activation function at the end, just the raw output of the last layer) + nn.CrossEntropyLoss
F_log_softmax on the model output + nn.NLLLoss

If you need to see the probabilities for debug/printing purpose:

use softmax and print the output
use exp() and print the output

Make sure to not pass the probabilities to the loss functions.