What is the difference between torch.nn and torch.nn.functional?

They look like a little same…
so, is there any difference between them?


While the former defines nn.Module classes, the latter uses a functional (stateless) approach.
To dig a bit deeper: nn.Modules are defined as Python classes and have attributes, e.g. a nn.Conv2d module will have some internal attributes like self.weight. F.conv2d however just defines the operation and needs all arguments to be passed (including the weights and bias). Internally the modules will usually call their functional counterpart in the forward method somewhere.

That being said, it depends also on your coding style how you would like to work with your modules/parameters etc. While modules might be good enough in most use cases, the functional API might give you additional flexibility which is needed sometimes.
We’ve have a similar discussion recently in this thread.


en…, thank you, :hugs:

how does gradients flow in the case of nn.functional ? i am a little confused. How do the weights get trained in case of nn.functional ?

Each operation is tracked by Autograd, if parameters are involved with require gradients.
The output of such operations get a .grad_fn attribute, which points to the backward function for the last operation:

x = torch.randn(1, 1)
w = nn.Parameter(torch.randn(1, 1))

output = x * w
> tensor([[2.5096]], grad_fn=<MulBackward0>)

The backward call uses these grad_fns to calculate the gradient and store it in the .grad attribute of the parameters:

> tensor([[1.1757]])

thanks for the reply
but what are the fundamental differences between torch.nn.Conv1d and torch.nn.functional.conv1d ?
i guess nn.Conv1d initializes the kernel weights automatically and nn.functional.conv1d needs an input kernel…

My doubts…

  • does gradient calculation and back-prop work in the same way for both of the above mentioned methods?
  • where would i want to use nn over nn.functional and vice-versa ? (what is the need for nn.functional.conv1d when you already have nn.Conv1d ? )
1 Like

Have a look at this post for some more information and my point of view.

TLDR: the modules (nn.Module) use internally the functional API.
There is no difference as long as you store the parameters somewhere (manually if you prefer the functional API or in an nn.Module “automatically”).
Having the nn.Module containers as an abstraction layer makes development easy and keeps the flexibility to use the functional API.


For my use case, I have added an extra parameter apart from weight and bias. Somehow the gradients for that extra parameter are zero even though requires_grad is true. I can’t seem to figure out why this can happen. Any ideas that anyone can think of? Thanks in advance.