What I did to use the same dropout mask for different time steps was inheriting classes as follows:
class SeqConstDropoutFunc(torch.nn._functions.dropout.Dropout):
def __init__(self, p=0.5, train=False, inplace=False):
super(SeqConstDropoutFunc, self).__init__(p, train, inplace)
def _make_noise(self, input): # for timesteps X batches X dims inputs, let each time step has the same dropout mask
return input.new().resize_(1, input.size(1), input.size(2))
class SeqConstDropout(nn.Dropout):
def __init__(self, p=0.5, inplace=False):
super(SeqConstDropout, self).__init__(p, inplace)
def forward(self, input):
return SeqConstDropoutFunc(self.p, self.training, self.inplace)(input)
It seems that overring _make_noise
isn’t a good idea. Then, I’ll either notice the changes of the code or make an independent dropout class.
Thanks.