When should we set torch.backends.cudnn.enabled to false, especially for LSTM?

Hi all,

I’m using Captum to do the integrated gradient for an LSTM model. If I just load the model and the weights, it shows cudnn RNN backward can only be called in training mode. I tried to remove model.eval() but it doesn’t give the correct result.

I also tried to put all the data and model to CPU and it works fine, but slowly. I’m not familiar with how PyTorch calculate the weights for LSTM but from the following thread, https://github.com/pytorch/captum/issues/564#issuecomment-748274352 , adding torch.backends.cudnn.enabled=False fixed the problem.
I noticed the GPU is still in work with cudnn.enabled=False.

My question is what does backends.cudnn.enabled mean here and why LSTM requires it. When should we concern or use it? Thanks!

backends.cudnn.enabled enables cudnn for some operations such as conv layers and RNNs, which can yield a significant speedup.
The cudnn RNN implementation doesn’t support the backward operation during eval() and thus raises the error. You could disable cudnn for your workload (as already done) or try to call .train() on the RNN module separately after using model.eval().

Hi thanks for your reply.

I’m using Captum for the integrated gradient. I have tried either backends.cudnn.enabled=False or avoid model.eval(). But these two methods give different results. Will it affect the gradient for LSTM if we remove model.eval()?

Thanks.

I think it would affect the gradients (assuming you could disable cudnn for the RNN), since e.g. dropout layers won’t be used and the running stats in batchnorm layers are applied to normalize the activation input instead of the batch statistics.

Ah, that makes sense. So we should disable cudnn when calculating the gradient for RNN. Thanks!

Still, could you please give more reasons why cudnn RNN does not support the backward operations?

It’s a limitation from the cudnn implementation.
From the cudnn docs:

fwdMode Input. Specifies inference or training mode (CUDNN_FWD_MODE_INFERENCE and CUDNN_FWD_MODE_TRAINING). In the training mode, additional data is stored in the reserve space buffer. This information is used in the backward pass to compute derivatives.

Thus if you call model.eval(), you won’t be able to calculate the gradients using the cudnn RNN implementation. As described before, calling:

model.eval()
model.rnn_layer.train()

could solve this issue, as it would keep the RNN in training mode.