Which is the right loss?

Should i use softmax on the last layer with Cross entropy loss for a binary classification? Is there a cheat sheet out there on how to pair up the last layer and loss criteria?

The docs for the loss functions (e.g. nn.CrossEntropyLoss) provide the necessary information on how to pass the inputs and targets.
nn.CrossEntropyLoss expects logits, so you shouldn’t apply a softmax on your model outputs.

1 Like

So I don’t see a ‘logits’ in the docs. I just let the linear layers run when constructing the model class and apply the loss to whatever the final layer puts out?

Yes, just pass the output of your last (linear) layer directly to nn.CrossEntropyLoss, as internally F.log_softmax and nn.NLLLoss will be applied.

In the docs “scores” is used, so you are right about the missing “logits”.

The input is expected to contain raw, unnormalized scores for each class.

1 Like

That’s awesome. Thanks.

Hi @ptrblck. I expect when i take the exponent of the predicted values after sending the data through, they would equal 1 after being summed. I don’t get that. Am I missing something?

If you apply F/log_softmax manually, the exponent of this output should sum to one.
Which output are you using at the moment? The model output or the (unreduced) loss function output?

the out put is just the output layer by itself. I will switch to log_softmax(x)?

In that case, you cannot expect the exponent of logits to sum to one.
To get the probabilities, you could use softmax on the output of the last layer.
Just make sure to not pass the softmax’ed output to nn.CrossentropyLoss.

So i use Log_softmax() on output with nn.CrossentropyLoss? Or log_softmax() with NLLloss()

You can use:

  • raw logits (no activation function at the end, just the raw output of the last layer) + nn.CrossEntropyLoss
  • F_log_softmax on the model output + nn.NLLLoss

If you need to see the probabilities for debug/printing purpose:

  • use softmax and print the output
  • use exp() and print the output

Make sure to not pass the probabilities to the loss functions.