Suppose we have 2 minibatch (each with 10 data point). When turning on the dropout for forward pass of first minibatch, the dropout mask with dimension 10 is generated. What if we want to use the same mask for the second batch of data ?
you can do this dropout operation yourself, instead of using
You can generate a bernoulli mask of numbers using
torch.bernoulli and then multiply your both mini-batches with the same mask.
# generate a mask of same shape as input1 mask = Variable(torch.bernoulli(input1.data.new(input1.data.size()).fill_(0.5))) output1 = input1 * mask output2 = input2 * mask
Is it correct to rescale the mask to output the same magnitude in following way ?
mask = Variable(torch.bernoulli(input1.data.new(input1.data.size()).fill_(0.4)))/0.6
Looks good to me…
For future readers I would like to mention that the rescaling is not correct.
Please note that the Bernoulli distribution samples 0 with the probability (1-p), contrary to dropout implementations, which sample 0 with probability p.
Therefore, if you want dropout with p=0.4, the mask has to be
mask = Bernoulli(torch.full_like(input1, 0.6)).sample()/0.6
For dropout with p=0.6 the mask is
mask = Bernoulli(torch.full_like(input1, 0.4)).sample()/0.4
what is .sample() ? in your code?
if I understand correctly, this way I generate a different mask (i.e. dropout) for each element in the batch (since
input1.data.size() also contains the size relative to the number of images in the batch); shouldn’t I rather apply the same mask to each element in the batch?