classification using LogSoftmax vs Softmax and calculating precision-recall curve?

In case of binary classification we could get final output using LogSoftmax or Softmax. In case of softmax we get results that add up to 1. I understand that LogSoftmax penalizes more for a wrong classification and few other mathematical advantage.

I have binary classification problem with class 1 occurring very rarely (<2% times)

my questions:

  1. If I am using probability cutoff of 0.5 (predicting to class 1 if prob is above 0.5) with Softmax then will I get same values for overall accuracy, class 1 - recall, precision and f1 as when using LogSoftmax (and using the lower value of output as prediction class)?
  2. How to calculate precision-recall curve when using LogSoftmax ? This link says that “The precision-recall curve is constructed by calculating and plotting the precision against the recall for a single classifier at a variety of thresholds.” How are we going to choose those thresholds if output is not between 0 to 1?

No, LogSoftmax doesn’t penalize a “more wrong” classification but applies the log to the softmax output in a numerically stable way.

  1. If you are working on a binary classification use case and are thinking about using a threshold, I assume your output has the shape [batch_size, 1] and you would be using nn.BCE(WithLogits)Loss. In this case, no (Log)Softmax would be used, as you have a single output neuron. To get the prediction using a probability threshold you could use torch.sigmoid(output_logits) > threshold.

  2. Again, LogSoftmax is used for e.g. nn.NLLLoss and a multi-class classification.

1 Like

I am still not clear.

in both softmax and logsoftmax case, my neural network output has shape [batch_size,2] and in both cases I am using cross_entropy(probs,labels). Truth labels have shape [batch_size,1]. Do I need to change anything?

question 3. based upon your comment, is using logsoftmax not useful as I am using cross_entropy?

could you answer my questions 1 and 2?

  1. just trying to rephrase - would logsoftmax and softmax give exact same output?
    2.if i am using logsoftmax then how to get precision-recall curve

please answer this post first…I created it to clear my understanding

Both, LogSoftMax and Softmax are wrong if you are using nn.CrossEntropyLoss as raw logits are expected and nn.CrossEntropyLoss will internally apply LogSoftmax.
In the multi-class classification, your target should have the shape [batch_size] and contain the class indices in [0, nb_classes-1]. Based on your description you are not working with nn.BCEWithLogitsLoss but are using nn.CrossEntropyLoss for a “2-class multi-class classification”. @KFrank describes the difference in your cross-post.

That is correct and you should remove it if you are using nn.CrossEntropyLoss. LogSoftmax is used with nn.NLLLoss.

No, since LogSoftmax applies the logarithm to the Softmax output.

As described before (and in your cross-post) you might want to switch to nn.BCEWithLogitLoss for a binary classification to be able to use thresholds to create the predictions.