Making a Custom Dropout Function

Hello everyone! This is my first post. What brought me here was my curiosity with experimenting with neural networks, but all other modules are very limiting (keras, theano, etc).

I came across pytorch and noticed that it’s good for experiments. I wanted to know how I could make a custom Dropout function that, when given the weights of a layer, It produces a vector of masks and it then applies the mask during forward propagation. I have some code with me, I really hope any of y’all could help me! THANKS

import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn.functional as F

# Hyper Parameters 
input_size = 784
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# MNIST Dataset 
train_dataset = dsets.MNIST(root='./data', 

test_dataset = dsets.MNIST(root='./data', 

# Data Loader (Input Pipeline)
train_loader =, 

test_loader =, 
class Dropout(nn.Module):
    def __init__(self, p=0.5, inplace=False):
        super(Dropout, self).__init__()
        if p < 0 or p > 1:
            raise ValueError("dropout probability has to be between 0 and 1, "
                             "but got {}".format(p))
        self.p = p
        self.inplace = inplace

    def forward(self, input):
        return F.dropout(input, self.p,, self.inplace)

    def __repr__(self):
        inplace_str = ', inplace' if self.inplace else ''
        return self.__class__.__name__ + '(' \
            + 'p=' + str(self.p) \
            + inplace_str + ')'

# Neural Network Model (1 hidden layer)
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.dropout = Dropout(0.2)
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.dropout(out)
        out = self.fc2(out)
        return out
net = Net(input_size, hidden_size, num_classes)

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)  

# Train the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Convert torch tensor to Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        # Forward + Backward + Optimize
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = net(images)
        loss = criterion(outputs, labels)
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size,[0]))

# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    outputs = net(images)
    _, predicted = torch.max(, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

# Save the Model, 'model.pkl')


I guess the simplest would be to create a custom version of the Linear layer that uses your Dropout layer. Something like:

# I did not tested this code, it might contain typos ! :)
class MyLinear(nn.Linear):
    def __init__(self, in_feats, out_feats, drop_p, bias=True):
        super(MyLinear, self).__init__(in_feats, out_feats, bias=bias)
        self.masker = Dropout(p=drop_p)

    def forward(self, input):
        masked_weight = self.masker(self.weight)
        return F.linear(input, masked_weight, self.bias)
1 Like

thank you very much for your fast reply! please excuse me, but I’m rather new to Pytorch, nonetheless your code is very readable!

So the class that you wrote would go, say:
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()

and what I’m guessing is that self.masker(self.weight) gets the weights from it’s input (the layer before it) and It’ll send them to the (custom) Dropout function, where it will return whatever it does? Oh! and:
return F.linear(input, masked_weight, self.bias), is masked_weight the, whatever activation function i use, times the dropout mask?, and why do I return F.linear(input…) why input? don’t I need to return only the output of the layer?

Thank you!

Ok, maybe my answer lacked a bit of context:
Here is how I understood your question: you want to have a way to have a dropout layer on the weight for a given layer. So for a layer that would do output = f(input, weight), you want a dropout layer such that you get output = f(input, dropout(weight)).
Given that the only layer that you use that has parameters in your code is a nn.Linear, that is why I used that in my example.
Given the current implementation of nn.Linear, the simplest way to apply dropout on the weights is by creating a new class as in my first answer that I will call MyLinear.
Then to use it, you simply replace self.fc1 = nn.Linear(input_size, hidden_size) by self.fc1 = MyLinear(input_size, hidden_size, dropout_p). That way, when you call out = self.fc1(x) later, the dropout will be applied within the forward call of self.fc1.

To be more precise on the forward function implemented above, it is basically implementing a linear layer for which the weight matrix has gone through a dropout layer before.
To do so, it first forwards the original weights in the dropout layer (that I called self.masker because you can see applying dropout as masking part of the matrix), then it uses these weights, that went through the dropout layer, and do what a regular linear layer would do, as you can see here in the original nn.Linear implementation.

1 Like

Hey again! Much thanks for your support :slight_smile:

I was wondering, If I can do this on the weights, is there a way to do it with the output of the layers? So, I’m trying to replicate Dropout using my own defined functions, but what I believe is that what we have here in this thread is applying Dropout on the weights of the layer. I hope I’m making sense! But many thanks for your replies! Extremely helpful

The main advantage of pytorch is actually that your forward function can be any python function that works with pytorch’s Variables. So you can create a layer that does anything you want, including using other nn.Modules. Basically as long as you don’t get an error that tells you that you’re doing something forbidden, then it works.

1 Like

I should be careful and not mix in any numpy functions in the pytorch functions, right? I bet pytorch has all I need for fast matrix multiplication/manipulations?

Anyways, you’ve given me enough information and motivation to keep on trying and learning Pytorch! Thank you very much :slight_smile:

Yes, you should not mix numpy operations, otherwise it won’t be differentiable and you won’t be able to backpropagate through your computations.

Have fun using pytorch !

1 Like

Follow up question to this concept. If I don’t want to use F.dropout as my lowest level implementation of dropout, how do I create a layer that behaves differently during training and evaluation?

1 Like