yes, softmax is needed when we want to calculate probability but when you use CrossEntropyLoss it handles that(read the first line of the link)
If you really want to apply softmax, then you have to use NLLLoss with log of the softmax
but now that I see, when you have only two classes you can use BCEwithLogits in the first place (so now your last linear layer has 25 inputs and 1 output, and there is no need to apply sigmoid there)
1 Like