'model.eval()' vs 'with torch.no_grad()'

ptrblck · September 22, 2020, 6:08am

Yes, that would be the case.
You could either set the .dropout attribute to zero or disable cudnn for this layer (then you wouldn’t need to call train() on it, but might see a slowdown):

rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda()
rnn.dropout = 0.0
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
    output, hn = rnn(input, h0)
    output.mean().backward()
    print(output.mean())


rnn = nn.RNN(10, 20, 2, dropout=0.5).cuda().eval()
input = torch.randn(5, 3, 10).cuda()
h0 = torch.randn(2, 3, 20).cuda()
for _ in range(2):
    with torch.backends.cudnn.flags(enabled=False):
        output, hn = rnn(input, h0)
    output.mean().backward()
    print(output.mean())