[SOLVED] RTX 2080 Ti: cuDNN error occurs when loading RNN Model to GPU

Papillon · January 5, 2019, 6:16pm

Hello everyone,

I have created a simple RNN network which runs on the CPU without any problems. However, when I load the model onto the GPU, I get the following error:

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Here is my Code of the RNN class:

class RNN_Netz(nn.Module):
	def __init__(self, Input_Num, Output_Num, Hidden_Num, Layer_Num):
		super(RNN_Netz, self).__init__()
		self.Input_Num = Input_Num
		self.Output_Num = Output_Num
		self.Hidden_Num = Hidden_Num
		self.Layer_Num = Layer_Num

		self.rnn = nn.RNN(input_size=Input_Num,
						  hidden_size=Hidden_Num,
						  num_layers=Layer_Num,
						  nonlinearity='relu',
						  bias=True,
						  batch_first=True,
						  dropout=0.1)

		self.linear = nn.Linear(Hidden_Num, Output_Num)

	def forward(self, x):
		h_init = self.init_hidden(x)
		input_tensor = x.transpose(0, 1).view(x.size(1), x.size(0), 1)
		ouput, hn = self.rnn(input_tensor, h_init)
		last_output = ouput[-1]
		result = self.linear(last_output)
		return result

	def init_hidden(self, x):

		device = torch.device('cuda:1' if (torch.cuda.is_available()) else 'cpu')
		dtype = torch.float

		h_init = torch.zeros(self.Layer_Num*1, x.size(1), self.Hidden_Num,
		                     device=device, dtype=dtype)

		return h_init

and here is the relevant code how i call the class:

	device = torch.device('cuda:1' if (torch.cuda.is_available()) else 'cpu')

	train_data = torch.utils.data.TensorDataset(train_features, train_targets)
	kwargs = {'num_workers': 2, 'pin_memory': True}
	loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=8000, shuffle=True, drop_last=True,**kwargs)
	model = RNN_Netz(Input_Num=1, Output_Num=1, Hidden_Num=50, Layer_Num=2)
	model = model.to(device)

The error message appears when I load the model onto the GPU --> model = model.to(device).

Here is the Detail Error:

Traceback (most recent call last):
  File "Main_RNN.py", line 36, in <module>
    run_trainings_process_RNN(1000, 8000)
  File "Main_RNN.py", line 34, in run_trainings_process_RNN
    RNN_class.train(train_features, train_targets, Epochen, Batch_size, 0.001, 1, 50, test_features, test_targets)
  File "/home/simtower2/Babak/Pytorch_LSTM/RNN_Net.py", line 108, in train
    model = model.to(device)
  File "/home/simtower2/Babak/Env/Pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/home/simtower2/Babak/Env/Pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/simtower2/Babak/Env/Pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "/home/simtower2/Babak/Env/Pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Previously, I had already trained FeedForward networks and LSTM networks with the torch.nn.LSTMCells on the GPU without any problems.

Therefore, I think it must be a special problem with the torch.nn.RNN. I also changed the code using torch.nn.LSTM, but the error message appears at the same place.

GPU: RTX 2080 TI, Pytorch 1.0

I’m grateful for any help!

Papillon · January 5, 2019, 11:34pm

Update:
I tried it again on another computer with a Tesla K20 GPU and it worked.

Does this error message appear because of the RTX 2080 Ti?

ptrblck · January 5, 2019, 11:40pm

Which cuDNN version are you using? Maybe it’s too old for your GPU.
Could you try to run your code with torch.backends.cudnn.enabled = False on your RTX?

Papillon · January 6, 2019, 11:29am

Thank you for your quick response!

torch.backend.cudnn.version() shows version 7401. With torch.backends.cudnn.enabled = False it runs without any error.

But I noticed that the speed is very slow specially compared to my previous LSTMCell implementation. The same dataset with the same batch size runs much faster with a LTMCell implementation.

How can that be? Did I do something wrong with the torch.nn.RNN implementation in my first post or is it still due to the graphics card?

I thought that by leaving the loop (for the timesteps) in the forward part, should be more efficiency. This was my basic idea when I switched to the torch.nn.LSTM or torch.nn.RNN implementation.

Here ist my previous Code with LSTMCell implementation which runs faster:

class LSTM_Netz(nn.Module):
	def __init__(self, Input_Num, Output_Num, Hidden_Num, Layer_Num):
		super(LSTM_Netz, self).__init__()
		self.Input_Num = Input_Num
		self.Output_Num = Output_Num
		self.Hidden_Num = Hidden_Num
		self.Layer_Num = Layer_Num

		self.lstm1 = nn.LSTMCell(Input_Num, Hidden_Num)
		self.lstm2 = nn.LSTMCell(Hidden_Num, Hidden_Num)
		self.linear = nn.Linear(Hidden_Num, 1)

	def forward(self, x):
		outputs = []

		h_t, c_t, h_t2, c_t2 = self.init_hidden(x)

		for i, input_t in enumerate(x.chunk(x.size(1), dim=1)):
			h_t, c_t = self.lstm1(input_t, (h_t, c_t))
			h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
			output = self.linear(h_t2)
			outputs += [output]

		outputs = torch.stack(outputs, 1).squeeze(2)
		return outputs[:, outputs.size(1)-1]

	def init_hidden(self, x):

		device = torch.device(gpu_device_Name if (torch.cuda.is_available()) else 'cpu')
		dtype = torch.float

		h_t = torch.zeros(x.size(0), self.Hidden_Num, dtype=dtype, device=device)
		c_t = torch.zeros(x.size(0), self.Hidden_Num, dtype=dtype, device=device)
		h_t2 = torch.zeros(x.size(0), self.Hidden_Num, dtype=dtype, device=device)
		c_t2 = torch.zeros(x.size(0), self.Hidden_Num, dtype=dtype, device=device)

		return h_t, c_t, h_t2, c_t2

ptrblck · January 6, 2019, 11:35am

That’s expected as cuDNN often leads to some significant speedup.
As I don’t have an RTX card myself, I’m just guessing that a version mismatch might throw your error.
Which CUDA version are you using?

Maybe @ngimel might help here.

Papillon · January 7, 2019, 8:46am

CUDA 10 is installed

henrye · January 7, 2019, 12:26pm

Just to mention that I’m also running into an error on the 2080ti but not on a K40 when using pytorch 1.0. It’s using the same code, environment. The only difference is it breaks on the 2080ti and throws an error

Traceback (most recent call last):
  File "/home/henrye/downloads/Henry_OpenNMT-py/train.py", line 120, in <module>
    main(opt)
  File "/home/henrye/downloads/Henry_OpenNMT-py/train.py", line 51, in main
    single_main(opt, 0)
  File "/home/henrye/downloads/Henry_OpenNMT-py/onmt/train_single.py", line 131, in main
    model = build_model(model_opt, opt, fields, checkpoint)
  File "/home/henrye/downloads/Henry_OpenNMT-py/onmt/model_builder.py", line 301, in build_model
    model = build_base_model(model_opt, fields, use_gpu(opt), checkpoint)
  File "/home/henrye/downloads/Henry_OpenNMT-py/onmt/model_builder.py", line 294, in build_base_model
    model.to(device)
  File "/home/henrye/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/home/henrye/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/henrye/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/henrye/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "/home/henrye/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Papillon · January 8, 2019, 8:56pm

I have reinstalled the computer and it now also works on the RTX without error messages. The RNN implementation still runs much slower than the LSTMCell variant with a loop.

ddhruvkr · January 19, 2019, 8:16pm

I get this same issue on RTX 2080 Ti. I am using Cuda 9.0, Pytorch 1.0 and I have tried with cuDNN 7.0 and 7.4. I had two questions. First is the error gets resolved on the second attempt. When I enclose model.to(device) inside a try catch and make it run again on the exception, it works. This seems quite strange. This also happened when I tried doing this through the Python shell. The second thing is what does the command torch.backends.cudnn.version() mean. I ran it when with cuDNN 7.0 and 7.4 and both times I got 7401? Thanks.

henrye · January 31, 2019, 3:45pm

Reinstalling worked

nikitautiu · February 1, 2019, 2:50pm

Did you do a complete cuda/cudnn reinstallation or just a fresh virtual environment? Also I am curious if the error persists when installing torch from pip as opposed to conda.

azadef · February 25, 2019, 10:43am

I had the same issue. reinstalling torch via the following command solved my problem:

pip install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp35-cp35m-linux_x86_64.whl

11129 · March 12, 2019, 10:23am

Go to pytorch website, and choose the version which satisfies your cuda version

cu100 = cuda 10.0

pip3 uninstall torch
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl

pinouchon · June 19, 2019, 10:45pm

I have the same issue as @ddhruvkr:
I have pytorch 1.1, cuda 9.0, cudnn 7, ubuntu 18.04, and a RTX 2080ti. When I load my model with .cuda(), I get cuDNN error: CUDNN_STATUS_EXECUTION_FAILED.
And the model loads without error when I run the code a second time.

There is a warning before the stacktrace (that I missed somehow):

UserWarning: 
    Found GPU0 GeForce RTX 2080 Ti which requires CUDA_VERSION >= 10000 to
     work properly, but your PyTorch was compiled
     with CUDA_VERSION 9000. Please install the correct PyTorch binary
     using instructions from https://pytorch.org

Soumith said here:

CUDA 9 and RTX 2080 Ti simply aren’t compatible and dont play well togethere.
An older CuDNN version working is likely a side-effect rather than expectation.
Use CUDA10 and CUDA10 versions of CuDNN etc. for RTX 2080 which is Turing architecture

Make sure that the CUDA version is 10.

thegopieffect · September 12, 2019, 10:20am

I can confirm this works on Ubuntu 18.04 lts,

Ubuntu 18.04 LTS
CUDA 10.2
CuDNN 7.6.1
Nvidia TITAN RTX x4

However I downgraded from python 3.7 to 3.6.9 and installed pytorch 1.0.0 using pip.

chenjus · September 18, 2019, 11:55pm

I’m getting this same issue with:

Ubuntu 16.04
CUDA 10.1
CuDNN 7602
Titan X
Nvidia driver 418.87.00
Python 3.7.4
Pytorch 1.2.0

thegopieffect · September 24, 2019, 8:36am

@chenjus did you try downgrading to the versions I mentioned above ?

Krishna_Garg · April 28, 2021, 2:35am

I am getting this error with:

Ubuntu 16.04
torch 1.8.0+cu111
Python 3.7.10
NVIDIA-SMI 450.80.02
Driver Version: 450.80.02
CUDA Version: 11.0
CUDNN_VERSION 7501

Here’s the entire Conda environment.

Additional information:
The error occurs for me during the loss.backward() call.

Not sure why this error occurs somewhere in the middle of the epoch. That means it works fine sometimes and sometimes not. Any suggestions/ comments are welcome.

Krishna_Garg · April 29, 2021, 4:41pm

Switching to Ubuntu 18.04, torch 1.8.1, NVIDIA-SMI 450.119.03 solved the problem for me.