I started to learn about pytorch lately after using tensorflow for almost 1 year, i am confused about something:
In Tensorflow when we have multiclassification problem we set at the last activation layer the number of classes and the type of activation function which is "Softmax" and using “Cross-entropy loss”
so in Pytorch when building a network we set last layer to nn.linear().
Basically in pytorch the cross entropy loss function combines a softmax and nllloss function into one. So for training you end up not needing to add a softmax function to the model because it is just computed in the loss function. However for predicting values with your model you would have to use a softmax function. You can also include a softmax function in your model and just use the nllloss function and it would do the same thing.
If by “predicting values” you mean predicting class labels, this is not
true. To get a predicted class label from the probabilities returned by softamx() you take argmax(). But softmax() does not change the
order of the values, so you can just as well take argmax() of the logits
you would have input to softmax(). That is, you can skip softmax()
altogether.
This is not quite right. You would need to use pytorch’s log_softmax()
followed by nll_loss() to reproduce pytorch’s cross_entropy().