Nonlinearities in torch.nn vs torch.nn.functional


What is the difference between torch.nn.sigmoid and torch.nn.functional.sigmoid ?


torch.nn.Sigmoid is a nn.Module, which behaves like a class, while torch.nn.functional.sigmoid is a function.
You can use any of them, and indeed, torch.nn.Sigmoid uses the functional interface internally (or almost).

It’s a matter of taste when using one or the other. For example, if you just want to write a basic fully-connected network, it might be easier to use a nn.Sequential and append some modules to it, like

model = nn.Sequential(
    nn.Linear(2, 2),
    nn.Linear(2, 2)

in which case the nn.Module interface is simpler. On the other hand, if you are writing some more complex functions, it might be easier to just go for the functional interface, as this avoids you from having to create a class for it. For example

# this is a function that will be differentiated
def my_function(x):
    x1 = x ** 2
    x2 = F.sigmoid(x ** 3)
    return x1 + x2

Thanks for your detailed answer!