# Understanding how dropout works

Hi,
I am wondering how Dropout is actually implemented in Pytorch. I found the source but I can’t understand much. My main question is how can PyTorch apply dropout after giving it the output of linear layer without any dropout?
Basically, I mean the code is usually something like this

``````x1 = F.relu(self.fc1(x))
x2 = self.dropout1(x1))
``````

So dropout1 function receives tensor `x1`, and has no idea what the input to `fc1` layer was, nor does it know the weights. Mathematically, it is clear that dropout function needs to know tensor `x` and weights of layer `fc1` to randomly remove weights and get an output.
How, then, is it able to compute `x2` properly without taking `x` or `fc1` weights as function arguments?

Dropout operates independent of the previous or the next layer and it is noting but sampling elements of the input with some probability and neglecting the rest, i.e. replacing with zero. For sampling elements of a tensor, you can sample from binomial distribution as many times as the size of the input.

Here’s a simple implementation:

``````import torch

dis = torch.distributions.binomial.Binomial(probs=.7)
w = torch.nn.Linear(3, 2)
input = torch.randn(8, 3)
h = w(input)
output = h*dis.expand(h.shape).sample()
print(output)
``````
1 Like

I feel so stupid here. I thought Dropout function applies it to tensor `x` in my example. If I understood correctly, it is actually applied to `x1` then?

That is true. In fact, you can replace the previous layer with any other operation. Think of the dropout as a colander with random holes 1 Like