Got 'RuntimeError: CUDNN_STATUS_INTERNAL_ERROR' in nn.functional.py conv3d()

JustnLiu · August 16, 2018, 4:11am

Hi,

I was trying to train GoogLeNet on a server with multiple GPUs using python3 + pytorch 0.3.1.post2. However, I keep getting this error: ‘RuntimeError: CUDNN_STATUS_INTERNAL_ERROR’. It seems like this error happens when conv3d() is called. The complete error message is below. Could anyone help me out with it?

Thanks

File “main.py”, line 220, in class_reg_eval
outputs = combined_classifier.net(event_data)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/Triforce_CaloML/Architectures/GoogLeNet.py”, line 142, in forward
x = self.pre_layers(x)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 67, in forward
input = module(input)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/conv.py”, line 388, in forward
self.padding, self.dilation, self.groups)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py”, line 126, in conv3d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

Updates:
I have tried several solutions I found with Google. I’ve tried ‘rm -rf ~/.nv’, ‘torch.backends.cudnn.benchmark = False’, etc. But they all didn’t work.

ptrblck · August 16, 2018, 9:59am

Is your code running on CPU?
Also, could you update to the latest release and check it again or do you need this PyTorch version for some reason?

JustnLiu · August 17, 2018, 12:59am

Thanks for your reply.
The code is running on GPUs. I have just updated PyTorch to 0.4.1, and ‘CUDNN_STATUS_INTERNAL_ERROR’ is gone. My GoogLeNet was working fine on 0.3.1 previously, and just went wrong recently. It is a weird error tho.
Thanks for your suggestion and it is runnable now.