# Why the torch.max() of predictions and F.softmax(pred) are equal? and What is the logic behind this?

I’m new in pytorch. Sorry if my question is stupid.

I have a multiclass classification problem and for it I have a convolutional neural network that has `Linear` layer in its last layer.

I get predictions from this model so it gives me a tensor that has n_class elements. For example:
`tensor([class_1, class_2, class_3])`

``````class CNN(nn.Module):

# Constructor
def __init__(self):

super(CNN, self).__init__()

# cnn and maxpool layers
self.cnn_ = nn.Conv2d(in_channels=1,
out_channels=conv_kernel_output_channel,
kernel_size=(filter_size, embedding_size),
stride=1,
self.maxpool_ = nn.MaxPool2d(kernel_size=(longest_sentence_length - filter_size + 1, 1),
stride=1,

# droput layer
self.drop = nn.Dropout(p = dropout_keep_prob)

# fully connected layer
self.fc1 = nn.Linear(conv_kernel_output_channel * len(filter_sizes), num_classes)

# Prediction
def forward(self, x):
# convolution layer
x1 = self.cnn_(x)

# apply activation function
x1 = torch.relu(x1)

# maxpooling layer
x1 = self.maxpool_(x1)

# flatten output
x1 = x1.view(x1.size(0), -1)

# dropout
x1 = self.drop(x1)

# fully connected layer
x1 = self.fc1(x1)

# no softmax at the end, because cross entropy loss have it implicitly

return x1

pred = CNN(x)
_, label_1 = torch.max(pred)

pred_soft = F.softmax(pred)
_, label_2 = torch.max(pred_soft )
``````

Questions:
1- Why getting the `torch.max()` from this prediction will give us the label, I mean why for desired label our model produce bigger values? What is the logic behind this?

2- why getting the `torch.max()` from this prediction and from `F.softmax()` will give use same results and why we can interpret them as same and is enough to use one of them for getting the predicted label?

Hi Ziba!

Your final `Linear` layer will produce* a set of raw-score logits
(unnormalized log-odds-ratios), one for each of the classes. These
are related to the probabilities that the network predicts for the sample
in question being in each of the classes, and, specifically, the class
probabilities are given by `softmax()` of the predicted logits.

*) Your network produces such values in essence because you train
it to produce such values.

Here you want `_, label_1 = torch.max (pred, dim = 1)` (assuming
that `pred` has shape `[nBatch, nClass]`). As written, your code will
produce an error. (Similarly, you want `torch.max (pred_soft, dim = 1)`.)

With the corrected expression, `torch.max()` will return both the `max()`,
which gets assigned to the variable `_` (used stylistically in python as a
“throw-away” variable), and the `argmax()` (the index of the maximum
element), which gets assigned to `label_1`. The largest logit corresponds
to the largest probability, and the index of the largest logit is the class
label for what the network is predicting as the most probable class.

The logits, `pred`, and the probabilities, `F.softmax (pred)`, are different
numbers, but the largest logit and the largest probability correspond to
one another (as do the second largest, and so on). So the index of the
largest logit (`argmax (pred)`) and the index of the largest probability
(`argmax (F.softmax (pred))`) are the same – that is, they both give
you the same predicted class label.

Best.

K. Frank