Adam optimizer with softmax gives grad_fn=<SoftmaxBackward0>

AlphaBetaGamma96 · June 5, 2024, 12:11pm

When using the SoftMax function, you’re predicting a class given an input (which for your model predicts the first class, out of 2 available classes).

In the second case, you’re just outputting the logits and if you fit that to your model, you’re no longer predicting classes based on a probability, but just fitting the output (which isn’t the same).

If you want to read more info, there’s a nice thread with more information here: Logits vs. log-softmax - #2 by KFrank