Why softmax is not applied on the network output during the evaluation?

In official PyTorch tutorials (and everywhere else), I found that during evaluating the model, the output of the network is passed directly to torch.max() to obtain the predicted labels. Isn’t this incorrect? Shouldn’t we apply softmax to the network output (which is simply the output from the final fully connected layer) and then take argmax to get the predicted labels? I understand that we don’t need to compute the softmax during the training since the loss function does that internally - but for why does torch.max() work during evaluation even though we should have used softmax, then argmax?

For example, here is the PyTorch CIFAR10 tutorial that uses the following lines during evaluating the model:

outputs = net(images)
_, predicted = torch.max(outputs.data, 1) # shouldn’t it be argmax(softmax(output))?

correct += (predicted == labels).sum().item()

For prediction and computing accuracy, we only care about which class has the maximum value. So, taking the argmax with or without softmax would give the same results.


Oh, right! max will always be the max - be it before or after softmax. We would only need softmax if we are interested in seeing the probability associated with each example.

Thanks a lot!