I am taking a course on deep learning - As part of the course work I have to build a project on CNN Classification - In my project I used SoftMax activation for the output layer as I am interested in the probabilities instead of scores. I also used Cross Entropy loss function as it worked better for my problem. I got the expected results - I am already aware the Cross Entropy loss function uses the combination of pytorch log_softmax & NLLLoss behind the scene.
But my project submission was rejected and the reviewer comment was that Softmax activation should not be used with Cross Entropy Loss Function per Pytorch documentation.
Seeking help from experts here to understand why Softmax and Cross Entropy Loss function should not used together in Pytorch.
Much appreciate your help
That’s correct and you’ve already described the reason as:
If you apply a
softmax on your output, the loss calculation would use:
loss = F.nll_loss(F.log_softmax(F.softmax(logits)), target)
which is wrong based on the formula for the cross entropy loss due to the additional
Thank you - If I need the probabilities as the model outcome during inference time, what should I do? I tried softmax function outside the model architecture but at the time of model inference but I lose the TopK function. If I need probablities as well as the functions like topk, what should I do?
softmax to the output “outside the model” and not passing it to the loss function would be the proper way top get the probabilities.
I don’t fully understand this description, as
softmax won’t change the order:
logits = torch.randn(10, 10)
preds = torch.topk(logits, k=3, dim=1).indices
prob = F.softmax(logits, dim=1)
preds_prob = torch.topk(prob, k=3, dim=1).indices
# > tensor(True)