Yeah, that’s why I am not sure what the negative values come from. Like the output before I apply the softmax is as tensor([6.0575, -5.3307]) for example, and then after I have applied softmax() it is as tensor([9.9999e-01, 1.1327e-05]). And log_softmax() provides me with the result as tensor([-1.1325e-05, -1.1389e+01]). But the result of argmax() is the same throughout. And the result for (pred - 10).softmax(-1) is the same as pred.softmax(-1). Thank you.