Hi,
I am wondering how Dropout is actually implemented in Pytorch. I found the source but I can’t understand much. My main question is how can PyTorch apply dropout after giving it the output of linear layer without any dropout?
Basically, I mean the code is usually something like this
x1 = F.relu(self.fc1(x))
x2 = self.dropout1(x1))
So dropout1 function receives tensor x1, and has no idea what the input to fc1 layer was, nor does it know the weights. Mathematically, it is clear that dropout function needs to know tensor x and weights of layer fc1 to randomly remove weights and get an output.
How, then, is it able to compute x2 properly without taking x or fc1 weights as function arguments?
Dropout operates independent of the previous or the next layer and it is noting but sampling elements of the input with some probability and neglecting the rest, i.e. replacing with zero. For sampling elements of a tensor, you can sample from binomial distribution as many times as the size of the input.
Here’s a simple implementation:
import torch
dis = torch.distributions.binomial.Binomial(probs=.7)
w = torch.nn.Linear(3, 2)
input = torch.randn(8, 3)
h = w(input)
output = h*dis.expand(h.shape).sample()
print(output)