How would I go about getting the probabilities of the output classes? For example, now I get a 0 or 1, but I would want something like 0.75 (for 0) and 0.25 (for 1). Do I have to use a softmax layer somehow? And would this only need to be done for the testing portion or will I also need to make some change for the training as well?

If you are using 2 output units for the binary classification use case and thus use torch.argmax to calculate the predicitons, you could use F.softmax to get the probabilities.
Note that you wonâ€™t need these probabilities to calculate the loss etc. if you are using nn.CrossEntropyLoss, but can of course inspect them.

Just to be clear, do you mean to just add a last F.softmax layer when getting my test predictions?

And yes, I am using nn.CrossEntropyLoss, so I wonâ€™t need it to calculate the loss, but I want to be formal here and use softmax for another operation.

output = model(data) # output contains logits
# you can calculate the loss using `nn.CrossEntropyLoss` and the logits output
loss = criterion(output, target)
# and you can calculate the probabilities, but don't pass them to `nn.CrossEntropyLoss`
probs = F.softmax(output, dim=1)

Thank you so much for this, since all the pre-trained models are majorly used for classification, why canâ€™t we have the final layer as the softmax so that we can get the probabilities with just output = model(data) wonâ€™t that be easier? And we can take argmax of it if we need labels.

Im not sure if im right, but i think the current pre-trained models output logits. Maybe we can have a flag or something to output both, or either probabilities or logits.

I think the main reason to output logits is because commonly used loss functions such as nn.CrossEntropyLoss (multi-class classification) and nn.BCEWithLogitsLoss (multi-label classification) expect logits, not probabilities. Applying the softmax manually would reduce the numerical precision without much benefit besides being able to â€śseeâ€ť the probabilities for debugging purposes.
Adding the F.softmax activation is trivial and (for these classification models) doesnâ€™t yield any purpose besides printing the probabilities (you should not pass the probabilities to the mentioned loss functions).