In official PyTorch tutorials (and everywhere else), I found that during evaluating the model, the output of the network is passed directly to torch.max() to obtain the predicted labels. Isn’t this incorrect? Shouldn’t we apply softmax to the network output (which is simply the output from the final fully connected layer) and then take argmax to get the predicted labels? I understand that we don’t need to compute the softmax during the training since the loss function does that internally - but for why does torch.max() work during evaluation even though we should have used softmax, then argmax?
For example, here is the PyTorch CIFAR10 tutorial that uses the following lines during evaluating the model:
outputs = net(images)
_, predicted = torch.max(outputs.data, 1) # shouldn’t it be argmax(softmax(output))?
correct += (predicted == labels).sum().item()