So I am using cross entropy loss function in my CNN classification task. But I am not sure if that’s appropriate to compare between my labels which are integers labeling starting from 0 (e.g. 0, 1, 2) and the outputs which are as softmax in the range of 0 to 1.
So if we compare theses two to find the losses, won’t that be really inaccurate? Like I should apply argmax to my outputs first in order to switch to integers before comparing with my actual labels? Thanks in advance
Hi L!
Pytorch’s CrossEntropyLoss
takes the raw output of your model,
that is, the output of your model’s final Linear
layer without any
following softmax()
(or other “activation”). These are to be understood
as the unnormalized log-probabilities of each of the three classes.
(CrossEntropyLoss
has log_softmax()
built into it.)
CrossEntropyLoss
knows how to compare your integer class labels
with the (unnormalized log-probabilities) outputs of your model.
Just to emphasize, you should not have any sort of softmax()
between your model’s last Linear
layer and CrossEntropyLoss
.
No, independent of the previous discussion, those integers are discrete
and therefore “not differentiable.” So using argmax()
in this way will
“break the computation graph” and prevent backpropagation. (You
would certainly do this to compute a performance metric like accuracy,
but not to compute a loss function that needs to be differentiable in
order to backpropagate.)
Best.
K. Frank
Hi Frank, thank you for answering to me question. So if I need to input the output from my Linear layer directly to my CrossEntropyLoss function, then when should I apply the softmax? Or I do not actually need the softmax to find out which class I should classify into?
Hi L!
Correct, you do not need softmax()
to predict a specific class. This
is because we (usually) predict a specific class by taking the specific
class for which the predicted probability – the result of softmax()
– is
largest.
But softmax()
doesn’t change the relative ordering of its inputs.
That is, letting pred
be the output of your model (and thus the
unnormalized log-probabilities predicted by your model),
pred.softmax (-1).argmax() = pred.argmax()
. So you can
simply apply argmax()
to the output of your model to get the
specific predicted class without first applying softmax()
(but you
can apply softmax()
– it doesn’t hurt anything except for taking
a tiny bit of extra time).
Best.
K. Frank
I have printed out the output right after nn.Linear and I have realized it is like a neg number and a positive number for classification of 2 classes. Will the crossEntropyLoss works fine with negative values?
Hi L!
The output of your final Linear
layer should be understood as
unnormalized log-probabilities. What would a negative number
mean in this context?
What happens when you pass a negative number through
log_softmax()
, as CrossEntropyLoss
does internally? What
happens when you pass a negative number through softmax()
?
How should the result of that be interpreted?
Let pred
be the output of your final Linear
layer.
What is the difference between pred.softmax (-1)
and (pred - 10).softmax (-1)
?
Best.
K. Frank
Yeah, that’s why I am not sure what the negative values come from. Like the output before I apply the softmax is as tensor([6.0575, -5.3307]) for example, and then after I have applied softmax() it is as tensor([9.9999e-01, 1.1327e-05]). And log_softmax() provides me with the result as tensor([-1.1325e-05, -1.1389e+01]). But the result of argmax() is the same throughout. And the result for (pred - 10).softmax(-1) is the same as pred.softmax(-1). Thank you.