Yes, that would be the case.
You could either set the .dropout
attribute to zero or disable cudnn for this layer (then you wouldn’t need to call train()
on it, but might see a slowdown):
rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda()
rnn.dropout = 0.0
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
output, hn = rnn(input, h0)
output.mean().backward()
print(output.mean())
rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda().eval()
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
with torch.backends.cudnn.flags(enabled=False):
output, hn = rnn(input, h0)
output.mean().backward()
print(output.mean())