Dropout functional API, advantages/disadvantages?

rasbt · January 25, 2017, 3:47am

I saw in one of the examples that the functional API was used to implement dropout for a conv layer but not for the fully connected layer. Was wondering if this has a specific reason? I pasted the code below

source: https://github.com/pytorch/examples/blob/master/mnist/main.py

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)

Am I correct that this is the same as the following code?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc1_drop = nn.Dropout(p=0.5) # added this line
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc1_drop(x) # added this line
        #x = F.dropout(x, training=self.training) # removed this line
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)

I am curious, did you chose this example for demonstration purposes, using nn and using the functional API, or does this have some performance (or other) reasons?

apaszke · January 25, 2017, 4:40pm

No, they’re equivalent and they expand to calls to the same autograd functions. The same applies for all other layers - both functional and non-functional versions work exactly the same, it’s only a matter of personal taste.

apaszke · January 25, 2017, 4:41pm

In that tutorial I probably missed the dropout module and that’s why I didn’t change it to functional. I find using the functional API for parameter-less operations and modules for everything containing parameters the best.

surag · February 8, 2018, 2:34am

But using functional makes this tedious, does it not?

rasbt · February 8, 2018, 3:00am

After I got back into PyTorch for some research projects lately, I adopted the habit of using the functional API for everything that does not have parameters (in the sense of weights, biases, etc.)

Regarding dropout, I think there’s no issue with that as you can specify the training/eval mode via e.g.,

x = F.dropout(out, p=dropout_prob, training=self.training)

PS: I don’t use nn.Sequential, though, because I usually like to be able access intermediate states conveniently

two_four · January 17, 2019, 8:14am

This answer may give the best explanation for the difference between nn.Dropout and nn.functional.dropout. Generally speaking, nn.Dropout and nn.functional.dropout perform the same functionality(source code), the main difference is that nn.Dropout is a wrapper class for the nn.functional.dropout, in which nn.Dropout derives from nn.Module. In this way, nn.Dropout is desired to be defined as a layer in your model, so that when your model’s state changes nn.Dropout layer will get notified, then nn.Dropout layer’s state would get changed according to the model’s state. For example, while calling model.eval() your nn.Dropout layer would directly pass all activations but to do nothing, if using nn.functional.dropout you have no choice to handle these stuff manually, for more details, please go to this answer.