They look like a little same…
so, is there any difference between them?
While the former defines nn.Module
classes, the latter uses a functional (stateless) approach.
To dig a bit deeper: nn.Modules
are defined as Python classes and have attributes, e.g. a nn.Conv2d
module will have some internal attributes like self.weight
. F.conv2d
however just defines the operation and needs all arguments to be passed (including the weights and bias). Internally the modules will usually call their functional counterpart in the forward
method somewhere.
That being said, it depends also on your coding style how you would like to work with your modules/parameters etc. While modules might be good enough in most use cases, the functional API might give you additional flexibility which is needed sometimes.
We’ve have a similar discussion recently in this thread.
en…, thank you,
how does gradients flow in the case of nn.functional ? i am a little confused. How do the weights get trained in case of nn.functional ?
Each operation is tracked by Autograd, if parameters are involved with require gradients.
The output of such operations get a .grad_fn
attribute, which points to the backward function for the last operation:
x = torch.randn(1, 1)
w = nn.Parameter(torch.randn(1, 1))
output = x * w
print(output)
> tensor([[2.5096]], grad_fn=<MulBackward0>)
The backward call uses these grad_fn
s to calculate the gradient and store it in the .grad
attribute of the parameters:
output.backward()
print(w.grad)
> tensor([[1.1757]])
thanks for the reply
but what are the fundamental differences between torch.nn.Conv1d
and torch.nn.functional.conv1d
?
i guess nn.Conv1d initializes the kernel weights automatically and nn.functional.conv1d needs an input kernel…
My doubts…
- does gradient calculation and back-prop work in the same way for both of the above mentioned methods?
- where would i want to use nn over nn.functional and vice-versa ? (what is the need for nn.functional.conv1d when you already have nn.Conv1d ? )
Have a look at this post for some more information and my point of view.
TLDR: the modules (nn.Module
) use internally the functional API.
There is no difference as long as you store the parameters somewhere (manually if you prefer the functional API or in an nn.Module
“automatically”).
Having the nn.Module
containers as an abstraction layer makes development easy and keeps the flexibility to use the functional API.
For my use case, I have added an extra parameter apart from weight and bias. Somehow the gradients for that extra parameter are zero even though requires_grad is true. I can’t seem to figure out why this can happen. Any ideas that anyone can think of? Thanks in advance.