Just another point here.
nn.Module is actually a OO wrapper around the functional interface, that contains a number of utility methods, like
parameters(), and it automatically creates the parameters of the modules for you.
you can use the functional interface whenever you want, but that requires you to define the weights by hand. Here is an example https://github.com/szagoruyko/wide-residual-networks/tree/master/pytorch
Just another point here.
Thanks Miguelvr, I get your points.
nn. Dropout is exactly the layers needed to be added into the neural networks, while functional.dropout will not behave differents in train and eval. mode.
Thansk for the explanation.
It seems using exsiting nn. modules is better choice for building networks while nn.functional are basic building blocks of those layers.
If some custom layers need be defined, then nn.functional may be used.
@Harry_Zhi You can have the functional
Dropout be aware of training/eval mode with
@mratsim, Your comment is very important one.
Would it b correct to say that if you don’t need your layer parameters to be optimized, define them using the Functional Class?
Yeah, that can be done manually as well.
For things that do not change between training/eval like sigmoid, relu, tanh, I think it makes sense to use functional; for others like dropout, I think it’s better to not use functional and use the module instead such that you get the expected behavior when calling model.eval() or model.train()
Assuming nn.dropout2d is always a better choice as compared to F.dropout, if we want to use dropout in a usual way (things change between training/eval).
Is there a situation when F.dropout is preferred over nn.dropout2d.
In Tacotron 2 dropout is used in the decoder input during training and inference. This is one example where one can use F.dropout, assuming it has the same behavior on model.train() and model.eval().
I’m assuming from pytorch’s dropout API
torch.nn.functional.dropout(input, p=0.5, training=False, inplace=False), that it doesn’t automatically change if one calls net.train() and net.eval() with functional dropout inside the model.
I would be good to have one of the pytorch devs commenting on this, for example @apaszke.
You can use the functional API and have it work with train/eval mode by doing this:
def forward(self, x): y = ... return F.dropout(y, training=self.training)
However, as the usual advice goes, I think it’s clearer to use modules for stateful function (in this case dropout can be considered stateful, because of this flag), and functional for everything else.
Performance wise, is there any preference between the 2?
Unhan…You did not give the “dropout rate”…
Module module of PyTorch is built on top of functional so there is a bit more overhead but it is completely dwarfed by the time spent on computing linear, convolutions, RNNs and other layers. (We’re speaking into seconds vs minutes/hours/days here)
“There is a dropout layer missed” not because you used nn.functional in forward, but because you commented out the nn.Dropout in int
If you uncomment that in init, the Dropout Layer still appears in print(model), even if in forward you used F.dropout
In Pytorch, print(model) gives whatever defined in init, not used in forward.
I was having a lot of confusion on this as well. Although the nn.modules layers give us the .train() and .eval() flexibilities, but if we initialize an activation or pooling operation as layer having its weight, then I would like to know that how giving parameters to these operations help in achieving the goal for which we have the model for?
This question is bumping in my mind a lot and I would like to end it once and for all so please help.
I’m not sure, what you mean by “giving parameters to there operations” exactly.
If you are not sure about whether to use an
nn.Module or the functional API, have a look at this longer post, where I describe my personal point of view.
Thanks for the reply,
I was assuming that nn wrapper on any operation makes the operation trainable like nn.Conv. But from the post you shared I think that in operations like maxpooling or avgpooling training these operations is not possible since they don’t have any parameters even if they are wrapped by nn class. Sorry, I messed up my basics.
Thanks for the reply.
Check this out,solution to this issue was clearly mentioned.
Reading through this thread I saw confusion across multiple posts about when to use,
nn.F. This is what people reading through this read should read twice:
torch.nn is stateful.
torch.nn.Functional is stateless. You have to initialize the
torch.nn modules so that the state can be tracked (not sure if tracked is the correct term).
Dropout is generally considered stateful while functions like ReLU are stateless - as it does not have weights, etc to update/keep track of. If I am wrong in any manner someone please chime in.
torch.nn module is more used for methods which have learnable parameters.
and functional for methods which do not have learnable parameters
Also check here, python - Pytorch: nn.Dropout vs. F.dropout - Stack Overflow