# Does cross entropy loss implicitly applies log_softmax?

Hi folks,
I’m bit confused in regards to the proper usage of cross entropy loss and log_softmax.
I’ve read somewhere that `nn.CrossEntropyLoss()` implicitly applies `nn.LogSoftmax` on the output from your net, is that true?
In that case is the implementation here wrong?

I’ve also read that if you want to be more verbose you could use `nn.NLLLoss()` with `nn.functional.F.log_softmax()`, is that true? In some experiments with some small MLPs using the above combination didn’t yield as good results as using simiply `nn.CrossEntropyLoss()`, are there any other intrinsic differences we should be aware of?

Thanks!

The implementation looks indeed wrong, as the code seems to combine `nn.LogSoftmax` with `nn.CrossEntropyLoss`.

Yes, you can see it in this line of code.

Both approaches should yield the same results.
The only possible mistake I could think of is if you specify the wrong `dim` using `F.log_softmax`.

Thank you, it cleared a lot of things out. The dim I was using in `F.log_softmax` was `-1`.
If you don’t mind me asking another stupid question I’ve noticed that `torch.topk()` with `k=1` actually works like `torch.max()` on a 2-dim `x` input over the `dim=1`. Am I correct in assuming that? When I first saw `torch.topk()` with `k=1` it confused me a lot as I was expecting the result to be just one value returned, and that would be the first top result?

Yes, both methods will return the same outputs:

``````x = torch.randn(10, 5)
print(torch.max(x, dim=1, keepdim=True))
print(torch.topk(x, 1, dim=1))
``````
1 Like

Thanks a lot it makes sense, although thinking about it `topk` with `k=1` you would expect intuitively to return just the top k results a the function name says, and in this case that would have been 1 value returned.