What's difference of nn.Softmax(), nn.softmax(), nn.functional.softmax()?

I think about modules as holding state (e.g. weights, as you mention) and using that and the inputs I pass to produce the output. So for softmax, using the functional interface expresses how I think about it (i.e. as a function of the inputs and no weights etc.).

But don’t make it a science, just go with what you prefer.

3 Likes