It seems that there are quite a few similar function in these two modules.
Take activation function (or loss function) as an example, for me the only difference is we need to instantiate the one in torch.nn but not for torch.nn.functional.
What I want to know is if there were any other further difference, say, the efficiency?
No efficiency difference. The activation, dropout, etc. Modules in torch.nn are provided primarily to make it easy to use those operations in an nn.Sequential container. Otherwise it’s simplest to use the functional form for any operations that don’t have trainable or configurable parameters.
Personally, I think creating activation, dropout, pooling etc. modules in __init__ makes it easier to reuse the model. For example, when extracting features, you may wish to wrap a pretrained model and overwrite the forward function to return the feature variables required. Having these modules let you conveniently do this instead of inserting many functional calls.
But using functional interfaces we may do some fancy operations like convolving two feature maps explicitly with F.conv2d.
I think this is correct, using nn.functional is lower-level. 1: you can’t benefit from the nn.sequential, so we have to manually define the parameters. However, you could still use the torch.optim to update the parameters in training.