Dropout for RNNs

supakjk · February 28, 2017, 7:06pm

What I did to use the same dropout mask for different time steps was inheriting classes as follows:

class SeqConstDropoutFunc(torch.nn._functions.dropout.Dropout):
        def __init__(self, p=0.5, train=False, inplace=False):
                super(SeqConstDropoutFunc, self).__init__(p, train, inplace)

        def _make_noise(self, input):   # for timesteps X batches X dims inputs, let each time step has the same dropout mask
                return input.new().resize_(1, input.size(1), input.size(2))

class SeqConstDropout(nn.Dropout):
        def __init__(self, p=0.5, inplace=False):
                super(SeqConstDropout, self).__init__(p, inplace)

        def forward(self, input):
                return SeqConstDropoutFunc(self.p, self.training, self.inplace)(input)

It seems that overring _make_noise isn’t a good idea. Then, I’ll either notice the changes of the code or make an independent dropout class.

Thanks.