I started to learn about pytorch lately after using tensorflow for almost 1 year, i am confused about something:
In Tensorflow when we have multiclassification problem we set at the last activation layer the number of classes and the type of activation function which is "Softmax" and using “Cross-entropy loss”
so in Pytorch when building a network we set last layer to nn.linear().
can anyone clarify the concept?
Thanks a lot.
Basically in pytorch the cross entropy loss function combines a softmax and nllloss function into one. So for training you end up not needing to add a softmax function to the model because it is just computed in the loss function. However for predicting values with your model you would have to use a softmax function. You can also include a softmax function in your model and just use the nllloss function and it would do the same thing.
Hi Dwight and Mohamed!
I would like to offer two clarifications:
If by “predicting values” you mean predicting class labels, this is not
true. To get a predicted class label from the probabilities returned by
softamx() you take
softmax() does not change the
order of the values, so you can just as well take
argmax() of the logits
you would have input to
softmax(). That is, you can skip
This is not quite right. You would need to use pytorch’s
nll_loss() to reproduce pytorch’s
Thank you a lot!
i will try both ideas
Thank you a lot K.Frank
I will try today.
I have another question what about using nn.softmax() instead of log_softmax() with nll_loss()?
i saw documentation about the first fuchtion, will it work ?