So I am using cross entropy loss function in my CNN classification task. But I am not sure if that’s appropriate to compare between my labels which are integers labeling starting from 0 (e.g. 0, 1, 2) and the outputs which are as softmax in the range of 0 to 1.

So if we compare theses two to find the losses, won’t that be really inaccurate? Like I should apply argmax to my outputs first in order to switch to integers before comparing with my actual labels? Thanks in advance

Hi L!

Pytorch’s `CrossEntropyLoss`

takes the raw output of your model,

that is, the output of your model’s final `Linear`

layer *without* any

following `softmax()`

(or other “activation”). These are to be understood

as the unnormalized log-probabilities of each of the three classes.

(`CrossEntropyLoss`

has `log_softmax()`

built into it.)

`CrossEntropyLoss`

knows how to compare your integer class labels

with the (unnormalized log-probabilities) outputs of your model.

Just to emphasize, you should *not* have any sort of `softmax()`

between your model’s last `Linear`

layer and `CrossEntropyLoss`

.

No, independent of the previous discussion, those integers are discrete

and therefore “not differentiable.” So using `argmax()`

in this way will

“break the computation graph” and prevent backpropagation. (You

would certainly do this to compute a performance metric like accuracy,

but not to compute a loss function that needs to be differentiable in

order to backpropagate.)

Best.

K. Frank

Hi Frank, thank you for answering to me question. So if I need to input the output from my Linear layer directly to my CrossEntropyLoss function, then when should I apply the softmax? Or I do not actually need the softmax to find out which class I should classify into?

Hi L!

Correct, you do not need `softmax()`

to predict a specific class. This

is because we (usually) predict a specific class by taking the specific

class for which the predicted probability – the result of `softmax()`

– is

largest.

But `softmax()`

doesn’t change the relative ordering of its inputs.

That is, letting `pred`

be the output of your model (and thus the

unnormalized log-probabilities predicted by your model),

`pred.softmax (-1).argmax() = pred.argmax()`

. So you can

simply apply `argmax()`

to the output of your model to get the

specific predicted class without first applying `softmax()`

(but you

can apply `softmax()`

– it doesn’t hurt anything except for taking

a tiny bit of extra time).

Best.

K. Frank

I have printed out the output right after nn.Linear and I have realized it is like a neg number and a positive number for classification of 2 classes. Will the crossEntropyLoss works fine with negative values?

Hi L!

The output of your final `Linear`

layer should be understood as

unnormalized log-probabilities. What would a negative number

mean in this context?

What happens when you pass a negative number through

`log_softmax()`

, as `CrossEntropyLoss`

does internally? What

happens when you pass a negative number through `softmax()`

?

How should the result of that be interpreted?

Let `pred`

be the output of your final `Linear`

layer.

What is the difference between `pred.softmax (-1)`

and `(pred - 10).softmax (-1)`

?

Best.

K. Frank

Yeah, that’s why I am not sure what the negative values come from. Like the output before I apply the softmax is as tensor([6.0575, -5.3307]) for example, and then after I have applied softmax() it is as tensor([9.9999e-01, 1.1327e-05]). And log_softmax() provides me with the result as tensor([-1.1325e-05, -1.1389e+01]). But the result of argmax() is the same throughout. And the result for (pred - 10).softmax(-1) is the same as pred.softmax(-1). Thank you.