Why the torch.max() of predictions and F.softmax(pred) are equal? and What is the logic behind this?

I’m new in pytorch. Sorry if my question is stupid.

I have a multiclass classification problem and for it I have a convolutional neural network that has Linear layer in its last layer.

I get predictions from this model so it gives me a tensor that has n_class elements. For example:
tensor([class_1, class_2, class_3])

class CNN(nn.Module):
    # Constructor
    def __init__(self):
        super(CNN, self).__init__()
        # cnn and maxpool layers
        self.cnn_ = nn.Conv2d(in_channels=1, 
                         kernel_size=(filter_size, embedding_size), 
        self.maxpool_ = nn.MaxPool2d(kernel_size=(longest_sentence_length - filter_size + 1, 1), 
        # droput layer
        self.drop = nn.Dropout(p = dropout_keep_prob)
        # fully connected layer
        self.fc1 = nn.Linear(conv_kernel_output_channel * len(filter_sizes), num_classes)
    # Prediction
    def forward(self, x):
        # convolution layer
        x1 = self.cnn_(x)

        # apply activation function
        x1 = torch.relu(x1)

        # maxpooling layer
        x1 = self.maxpool_(x1)

        # flatten output
        x1 = x1.view(x1.size(0), -1)
        # dropout
        x1 = self.drop(x1)
        # fully connected layer
        x1 = self.fc1(x1)
        # no softmax at the end, because cross entropy loss have it implicitly
        return x1

pred = CNN(x)
_, label_1 = torch.max(pred)

pred_soft = F.softmax(pred)
_, label_2 = torch.max(pred_soft )

1- Why getting the torch.max() from this prediction will give us the label, I mean why for desired label our model produce bigger values? What is the logic behind this?

2- why getting the torch.max() from this prediction and from F.softmax() will give use same results and why we can interpret them as same and is enough to use one of them for getting the predicted label?

Hi Ziba!

Your final Linear layer will produce* a set of raw-score logits
(unnormalized log-odds-ratios), one for each of the classes. These
are related to the probabilities that the network predicts for the sample
in question being in each of the classes, and, specifically, the class
probabilities are given by softmax() of the predicted logits.

*) Your network produces such values in essence because you train
it to produce such values.

Here you want _, label_1 = torch.max (pred, dim = 1) (assuming
that pred has shape [nBatch, nClass]). As written, your code will
produce an error. (Similarly, you want torch.max (pred_soft, dim = 1).)

With the corrected expression, torch.max() will return both the max(),
which gets assigned to the variable _ (used stylistically in python as a
“throw-away” variable), and the argmax() (the index of the maximum
element), which gets assigned to label_1. The largest logit corresponds
to the largest probability, and the index of the largest logit is the class
label for what the network is predicting as the most probable class.

The logits, pred, and the probabilities, F.softmax (pred), are different
numbers, but the largest logit and the largest probability correspond to
one another (as do the second largest, and so on). So the index of the
largest logit (argmax (pred)) and the index of the largest probability
(argmax (F.softmax (pred))) are the same – that is, they both give
you the same predicted class label.


K. Frank