How to choose between torch.nn.Functional and torch.nn module?

Harry_Zhi · May 9, 2017, 2:11pm

Hi, everyone. I am new to Pytorch from Keras. I find Pytorch has sufficient existing modules and functions that I would like to use in my work. It is great.

I have a problem understanding the codes about /examples/minist main.py, where the CNN for MNIST is defined, as shown below:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

I see some functions like Dropout used in both init and forward methods, however in init_ we use Dropout from torch.nn module while the latter one is from torch.nn.functional,
is there any reason for this configuration? Why not use the same dropout twice.

Also, can we use nn.relu instead of F.relu in forward part?

Hope someone can help me distinguish the same function module like convolution, dropout , activation in nn. and nn.functional. When shoud I better use the nn module and when is the nn.functional works better?

By the way, the parameters received by nn.Conv2d and F.con2d are also different, and when we use Class nn.Con2d, it uses the function F.conv2d in its forward method.

miguelvr · May 9, 2017, 3:31pm

In PyTorch you define your Models as subclasses of torch.nn.Module.

In the __init__ function, you are supposed to initialize the layers you want to use. Unlike keras, Pytorch goes more low level and you have to specify the sizes of your network so that everything matches.

In the forward method, you specify the connections of your layers. This means that you will use the layers you already initialized, in order to re-use the same layer for each forward pass of data you make.

torch.nn.Functional contains some useful functions like activation functions a convolution operations you can use. However, these are not full layers so if you want to specify a layer of any kind you should use torch.nn.Module.

You would use the torch.nn.Functional conv operations to define a custom layer for example with a convolution operation, but not to define a standard convolution layer.

Let me know if it is clear!

Cheers

Harry_Zhi · May 9, 2017, 4:44pm

Thanks a lot for your detailed response, Miguel.

From your explanation, I get to know that torch.nn module is a real layer which can be added or connected to other layers or network models. However, the functions in torch.nn.functional are just some arithmetical operations , not the layers which have trainable parameters such as weights and bias terms.

As a result, layers with parameters are usually initialized in init to be shared by the whole module, while some connections or simple operations without parameters can be defined in forward to be used in forward propagation.

As for dropout, because it does not have trainable weights, so that we can use either the layer form (from torch.nn) or connection form (from torch.nn.functional), is that correct?
By this I mean, if I delete the layer self.conv2_drop layer in init, I should change the forawrd method like this:

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
            self.conv2 = nn.Conv2d(10, 20, kernel_size=5)

            #delete this layer 
            #self.conv2_drop = nn.Dropout2d()

            self.fc1 = nn.Linear(320, 50)
            self.fc2 = nn.Linear(50, 10)

        def forward(self, x):
            x = F.relu(F.max_pool2d(self.conv1(x), 2))

           # replace 
           #x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
            x = self.conv2(x)
            x = F.dropout(x, training = self.training)
            x = F.relu(F.max_pool2d(x, 2))
           #Or use 
           #x = F.relu(F.max_pool2d(F.dropout(self.conv2(x)), 2))
         
            x = x.view(-1, 320)
            x = F.relu(self.fc1(x))
            x = F.dropout(x, training=self.training)
            x = self.fc2(x)
            return F.log_softmax(x)

Does the change of dropout layer to connections affect something?

miguelvr · May 9, 2017, 4:57pm

Although nn.Dropout is not a trainable layer, it is still a nn.Module, so you should initialize it in __init__ first and then use it. I think it would raise an error otherwise.

Harry_Zhi · May 9, 2017, 5:42pm

HI Miguel,
I have just tried the modified codes and no error was raised.
From model.named_modules() ,I can see the difference of these two networks:
As for original Net:

(conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
(conv2_drop): Dropout2d (p=0.5)
(fc1): Linear (320 -> 50)
(fc2): Linear (50 -> 10)

As for the modified version:

('conv1', Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1)))
('conv2', Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1)))
('fc1', Linear (320 -> 50))
('fc2', Linear (50 -> 10))

There is a dropout layer missed. But both network models works and should behave the same operation in forward propagation.

I wonder if these two networks are naturally the same during training and testing?

@apaszke @smth @colesbury
Hope you can also give me some suggestions if possible.
Thanks.

miguelvr · May 10, 2017, 9:26am

I’m just guessing here, but I would say, based on the documentation, that only nn.Dropout changes behaviour when doing model.eval() and model.train(), while F.dropout is the standard dropout operation. You should probably keep using nn.Dropout

fmassa · May 10, 2017, 11:05pm

Just another point here.
A nn.Module is actually a OO wrapper around the functional interface, that contains a number of utility methods, like eval() and parameters(), and it automatically creates the parameters of the modules for you.
you can use the functional interface whenever you want, but that requires you to define the weights by hand. Here is an example https://github.com/szagoruyko/wide-residual-networks/tree/master/pytorch

Harry_Zhi · May 12, 2017, 12:19am

Thanks Miguelvr, I get your points.
nn. Dropout is exactly the layers needed to be added into the neural networks, while functional.dropout will not behave differents in train and eval. mode.

Harry_Zhi · May 12, 2017, 12:21am

Thansk for the explanation.

It seems using exsiting nn. modules is better choice for building networks while nn.functional are basic building blocks of those layers.

If some custom layers need be defined, then nn.functional may be used.

mratsim · May 12, 2017, 11:25am

@Harry_Zhi You can have the functional Dropout be aware of training/eval mode with F.dropout(x, training=self.training)

Royi · November 18, 2017, 11:25am

@mratsim, Your comment is very important one.

Would it b correct to say that if you don’t need your layer parameters to be optimized, define them using the Functional Class?

Rafael_Valle · January 16, 2018, 7:47pm

Yeah, that can be done manually as well.
For things that do not change between training/eval like sigmoid, relu, tanh, I think it makes sense to use functional; for others like dropout, I think it’s better to not use functional and use the module instead such that you get the expected behavior when calling model.eval() or model.train()

ashish_arora · March 3, 2018, 9:43pm

@Rafael_Valle ,
Assuming nn.dropout2d is always a better choice as compared to F.dropout, if we want to use dropout in a usual way (things change between training/eval).
Is there a situation when F.dropout is preferred over nn.dropout2d.

Rafael_Valle · March 3, 2018, 10:52pm

In Tacotron 2 dropout is used in the decoder input during training and inference. This is one example where one can use F.dropout, assuming it has the same behavior on model.train() and model.eval().

I’m assuming from pytorch’s dropout API torch.nn.functional.dropout(input, p=0.5, training=False, inplace=False), that it doesn’t automatically change if one calls net.train() and net.eval() with functional dropout inside the model.

I would be good to have one of the pytorch devs commenting on this, for example @apaszke.

apaszke · March 4, 2018, 8:23pm

You can use the functional API and have it work with train/eval mode by doing this:

def forward(self, x):
    y = ...
    return F.dropout(y, training=self.training)

However, as the usual advice goes, I think it’s clearer to use modules for stateful function (in this case dropout can be considered stateful, because of this flag), and functional for everything else.

Royi · April 22, 2018, 4:02pm

Performance wise, is there any preference between the 2?

tian_wen · August 20, 2018, 2:41am

Unhan…You did not give the “dropout rate”…

mratsim · September 1, 2018, 11:53am

The Module module of PyTorch is built on top of functional so there is a bit more overhead but it is completely dwarfed by the time spent on computing linear, convolutions, RNNs and other layers. (We’re speaking into seconds vs minutes/hours/days here)

Ancalagon · January 8, 2019, 12:44am

“There is a dropout layer missed” not because you used nn.functional in forward, but because you commented out the nn.Dropout in int
If you uncomment that in init, the Dropout Layer still appears in print(model), even if in forward you used F.dropout

In Pytorch, print(model) gives whatever defined in init, not used in forward.

rustagiadi95 · September 27, 2019, 4:14am

I was having a lot of confusion on this as well. Although the nn.modules layers give us the .train() and .eval() flexibilities, but if we initialize an activation or pooling operation as layer having its weight, then I would like to know that how giving parameters to these operations help in achieving the goal for which we have the model for?

This question is bumping in my mind a lot and I would like to end it once and for all so please help.
Thanks,