I’m bit confused in regards to the proper usage of cross entropy loss and log_softmax.
I’ve read somewhere that nn.CrossEntropyLoss() implicitly applies nn.LogSoftmax on the output from your net, is that true?
In that case is the implementation here wrong?
I’ve also read that if you want to be more verbose you could use nn.NLLLoss() with nn.functional.F.log_softmax(), is that true? In some experiments with some small MLPs using the above combination didn’t yield as good results as using simiply nn.CrossEntropyLoss(), are there any other intrinsic differences we should be aware of?
Thank you, it cleared a lot of things out. The dim I was using in F.log_softmax was -1.
If you don’t mind me asking another stupid question I’ve noticed that torch.topk() with k=1 actually works like torch.max() on a 2-dim x input over the dim=1. Am I correct in assuming that? When I first saw torch.topk() with k=1 it confused me a lot as I was expecting the result to be just one value returned, and that would be the first top result?
Thanks a lot it makes sense, although thinking about it topk with k=1 you would expect intuitively to return just the top k results a the function name says, and in this case that would have been 1 value returned.