I am trying to provide my own dropout mask in the forward function while defining a feed forward network.
For example
class Perceptron(torch.nn.Module):
def __init__(self, mask):
super(Perceptron, self).__init__()
self.fc = nn.Linear(5,10)
self.myDrop = MyDrop(mask)
def forward(self, x):
output = self.fc(x)
o1 = self.myDrop(output)
return o1
Here mask is a vector consisting of 0’s and 1’s so accordingly the value of o1 with be either 0 or the existing values. In MyDrop() function i am simple just multiply the values of output with mask to get o1 as output. So here the gradient should all be zero for that node, but how does autogradd maintain this since i am not making custom Dropout class but instead using a mask through a simple function. So due backprop the graient will still be flow through those nodes even if the output has been made 0 using a mask?
1 Like
Yes, Autograd will track the multiplication as a differentiable operation and will calculate the gradients accordingly.
Are you seeing any issues using your approach?
Sorry for late reply.Yes the backpropagation is working but the first linear layer gradient is always zero(I can’t get the reason behind it, if you can suggest something it will be helpful). I still have some confusion which I want to understand.
- Do we need to multiply the mask after getting the activation output from linear layer like f1=nn.Linear(), then mask_output = mask * self.Relu(f1(x)) or proceed in different way where the ouput of the previous layer x needs to be mutipled first with mask like
#x is the output of the previous layer
mask_out = x*mask
f1=nn.Linear()
out1 = self.Relu(f1(mask_out)))
- Since we are using our own mask as dropout so at the time of evaluation so are passing a variable to the model for checking whether we are in train or eval state using model.eval(). is this correct?
3)Also since pytorch uses dropout as inverted dropout so how do we handle this case for train and eval case using our own mask.