Multi classification activation function for last layer

I started to learn about pytorch lately after using tensorflow for almost 1 year, i am confused about something:

In Tensorflow when we have multiclassification problem we set at the last activation layer the number of classes and the type of activation function which is "Softmax" and using “Cross-entropy loss”

so in Pytorch when building a network we set last layer to nn.linear().

can anyone clarify the concept?

Thanks a lot. :slight_smile:

Basically in pytorch the cross entropy loss function combines a softmax and nllloss function into one. So for training you end up not needing to add a softmax function to the model because it is just computed in the loss function. However for predicting values with your model you would have to use a softmax function. You can also include a softmax function in your model and just use the nllloss function and it would do the same thing.

1 Like

Hi Dwight and Mohamed!

I would like to offer two clarifications:

If by “predicting values” you mean predicting class labels, this is not
true. To get a predicted class label from the probabilities returned by
softamx() you take argmax(). But softmax() does not change the
order of the values, so you can just as well take argmax() of the logits
you would have input to softmax(). That is, you can skip softmax()

This is not quite right. You would need to use pytorch’s log_softmax()
followed by nll_loss() to reproduce pytorch’s cross_entropy().


K. Frank


Thank you a lot!
i will try both ideas :blush:

Thank you a lot K.Frank :smiling_face_with_three_hearts:
I will try today.

I have another question what about using nn.softmax() instead of log_softmax() with nll_loss()?

i saw documentation about the first fuchtion, will it work ?