Should I use softmax activation before crossentropy loss backward?ack

I know nn.CrossEntropyLoss() incoporates softmax. So my model output did not add softmax layer. But when I do the loss.backward(), should I change the outputs using softmax like outputs = torch.softmax(outputs, dim=1).cpu().detach().numpy()?

As you said, you should not apply softmax on the outputs before passing it to nn.CrossEntropyLoss.

If you’ve calculated the loss using the raw logits and would now want to get the probabilities for debugging purposes etc., then you could of course apply the softmax afterwards.