How to fix the Dropout mask for different batch

Suppose we have 2 minibatch (each with 10 data point). When turning on the dropout for forward pass of first minibatch, the dropout mask with dimension 10 is generated. What if we want to use the same mask for the second batch of data ?

you can do this dropout operation yourself, instead of using nn.Dropout.

You can generate a bernoulli mask of numbers using torch.bernoulli and then multiply your both mini-batches with the same mask.

For example:

# generate a mask of same shape as input1
mask = Variable(torch.bernoulli(

output1 = input1 * mask
output2 = input2 * mask

Is it correct to rescale the mask to output the same magnitude in following way ?

mask = Variable(torch.bernoulli(

Looks good to me…

Best regards


1 Like

For future readers I would like to mention that the rescaling is not correct.
Please note that the Bernoulli distribution samples 0 with the probability (1-p), contrary to dropout implementations, which sample 0 with probability p.

Therefore, if you want dropout with p=0.4, the mask has to be
mask = Bernoulli(torch.full_like(input1, 0.6)).sample()/0.6

For dropout with p=0.6 the mask is
mask = Bernoulli(torch.full_like(input1, 0.4)).sample()/0.4


what is .sample() ? in your code?

if I understand correctly, this way I generate a different mask (i.e. dropout) for each element in the batch (since also contains the size relative to the number of images in the batch); shouldn’t I rather apply the same mask to each element in the batch?