I think about modules as holding state (e.g. weights, as you mention) and using that and the inputs I pass to produce the output. So for softmax, using the functional interface expresses how I think about it (i.e. as a function of the inputs and no weights etc.).
But don’t make it a science, just go with what you prefer.