Custom Mask for dropout

I am trying to provide my own dropout mask in the forward function while defining a feed forward network.
For example

class Perceptron(torch.nn.Module):
    def __init__(self, mask):
        super(Perceptron, self).__init__()
        self.fc = nn.Linear(5,10)
       self.myDrop = MyDrop(mask)
def forward(self, x):
        output = self.fc(x)
        o1 = self.myDrop(output)
        return o1

Here mask is a vector consisting of 0’s and 1’s so accordingly the value of o1 with be either 0 or the existing values. In MyDrop() function i am simple just multiply the values of output with mask to get o1 as output. So here the gradient should all be zero for that node, but how does autogradd maintain this since i am not making custom Dropout class but instead using a mask through a simple function. So due backprop the graient will still be flow through those nodes even if the output has been made 0 using a mask?

1 Like

Yes, Autograd will track the multiplication as a differentiable operation and will calculate the gradients accordingly.
Are you seeing any issues using your approach?

Sorry for late reply.Yes the backpropagation is working but the first linear layer gradient is always zero(I can’t get the reason behind it, if you can suggest something it will be helpful). I still have some confusion which I want to understand.

  1. Do we need to multiply the mask after getting the activation output from linear layer like f1=nn.Linear(), then mask_output = mask * self.Relu(f1(x)) or proceed in different way where the ouput of the previous layer x needs to be mutipled first with mask like
#x is the output of the previous layer
mask_out = x*mask
out1 = self.Relu(f1(mask_out)))
  1. Since we are using our own mask as dropout so at the time of evaluation so are passing a variable to the model for checking whether we are in train or eval state using model.eval(). is this correct?

3)Also since pytorch uses dropout as inverted dropout so how do we handle this case for train and eval case using our own mask.

  1. It depends, some apply dropout before activation while others apply it after the activation. However, in the case of ReLU these 2 methods should be equivalent.
    2.Yes, u should look for whether the model is in “train” or “eval”. I remember there’s a function/attribute called “training”.
    3.U should take a closer look at how inverted dropout work(well I suggest u also take a look at how normal dropout work). It’s quite simple to implement.