However, for softmax function for example, there’s only torch.nn.Softmax and torch.nn.functional.softmax, but notorch.softmax. I am confused and want to know what is the thinkings behind designing these. And are there other functions designed like this?
So the idea is to put more deep-learning-oriented functions in torch.nn.functional and keep general-purpose functions in under torch directly. softmax was deemed to fall into the former, sigmoid in the latter category.
While there is torch.softmax, this is by accident (which is why it is not documented) rather than as a design (previous version of PyTorch didn’t have the fancy torch._C._nn module to put the C++-implementations of torch.nn.functional-functions). I would advise to only use the documented variants to stay out of trouble should someone start to clean up.