What's difference of nn.Softmax(), nn.softmax(), nn.functional.softmax()?

Hi. I guess there are three way to use activation function in a custom module:

  1. Use nn.Softmax() in the initializer in the custom model.
  2. Use nn.softmax() in the forward function in the custom model.
  3. Use nn.functional.softmax() in the forward function in the custom model.

What are the differences among them and what is the proper way to use an activation function in my custom model?

1 Like

nn.Softmax is an nn.Module, which can be initialized e.g. in the __init__ method of your model and used in the forward.

torch.softmax() (I assume nn.softmax is a typo, as this function is undefined) and nn.functional.softmax are equal and I would recommend to stick to nn.functional.softmax, since it’s documented. @tom gives a better answer here. :wink:


@ptrblck nails it with his comment.

We also do have a short discussion in the Deep Learning with PyTorch book in chapter 8, similar to the brief mention in slide 54 for a course I taught earlier this month - copied here for your convenience:

Some thoughts on functional vs. Module

  1. If you write for re-use, the functional / Module split of PyTorch has turned out
    to be a good idea.
  2. Use functional for stuff without state (unless you have a quick and dirty Sequential).
  3. Never re-use modules (define one torch.nn.ReLU and use it 5 times). It’s a trap!
    When doing analysis or quantization (when ReLU becomes stateful due to quantization params), this will break.

The latter two seem to cover your case here, but 2. is more a matter of personal preference, while 3. is really about writing better (as in less risky, more clear) code.
Item 1. ist more is more for when you write a library or something you expect to be re-used a lot.

Best regards


1 Like

Thank you for your answer but I’m still not sure what are the differences of them.

So can I say that functional is for re-use? What do ‘stuff without state’ and ‘quantization’ mean? Which should I use between functional and Module?

That is more “if you implement something new, provide both because you don’t know which is more convenient to your users”.

That’s another slide, there.
Or, try the “what is nn” tutorial.
Quantization has its own set of tutorials.

For softmax, and if you aren’t building a Sequential, I’d use the functional interface. But there is a modicum of taste in that.

Best regards


Thank you for your kind answer.I understand the ‘state’. I guess it refers a weight in a neural network right? Quantization is something I’ve never seen so I need to see youre reference. However, I’m little bit confuse becuase I think there is no conclusion for some kind of a threshold that using functional or Module. Maybe that’s why I’m not an native english speaker.

I apologise for the inconvenience but if I’m not asking you this question, I will never know. Please excuse me for bothering you. And I want to quote what you mentioned just like you did to mine, but I don’t know how to do it so I just write them down.

That is more “if you implement something new, provide both because you don’t know which is more convenient to your users”

I don’t want to make an activation function. I just want to know proper way to use them (softmax, ReLU, whatever it is). However, I guess there is no clear standard for it. The functional are not saved in the stae of a neural network so I should avoid it if I want to save them as state_dict right?

For softmax, and if you aren’t building a Sequential , I’d use the functional interface.

Why would you do that? What’s the beneficial of using functional interface instead of Module?

Once again, I apologise for the inconvenience.

I think about modules as holding state (e.g. weights, as you mention) and using that and the inputs I pass to produce the output. So for softmax, using the functional interface expresses how I think about it (i.e. as a function of the inputs and no weights etc.).

But don’t make it a science, just go with what you prefer.