Hi Hiep!
The short answer is that nn.functional.cross_entropy
one-hots your class labels for you.
The number-one rule is that the output of your network means
whatever you train it to mean.
More directly to your question:
You are using nn.functional.cross_entropy
as your loss
function. Your yb
are integer class labels, one per sample.
Conceptually, implicitly under the hood, cross_entropy
one-hots your yb
, implicitly softmaxes your pred
, and then
implicitly calculates the cross-entropy of “softmax (pred)” and
“one-hot (yb)”.
The output of your model (the y_pred
) should be understood
as logits. They (implicitly) get turned into probabilities when
cross_entropy
(implicitly) softmaxes them. So you are training
your model to output (for each sample) a vector of length nClass,
where the value for index i is the logit (sort of like the probability)
of that sample being of class i. (Finally, you take the argmax
of
your prediction vector. This finds the index of the logit with the
largest value – that is the index that your model predicts as having
the highest probability of being the class label, and you take this
as being the predicted class label.)
“PyTorch knows to link the index of y_pred
with the correct label”
because you trained your network to do so.
Good Luck.
K. Frank