'model.eval()' vs 'with torch.no_grad()'

Yes, that would be the case.
You could either set the .dropout attribute to zero or disable cudnn for this layer (then you wouldn’t need to call train() on it, but might see a slowdown):

rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda()
rnn.dropout = 0.0
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
    output, hn = rnn(input, h0)
    output.mean().backward()
    print(output.mean())


rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda().eval()
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
    with torch.backends.cudnn.flags(enabled=False):
        output, hn = rnn(input, h0)
    output.mean().backward()
    print(output.mean())