Got 'RuntimeError: CUDNN_STATUS_INTERNAL_ERROR' in nn.functional.py conv3d()

Hi,

I was trying to train GoogLeNet on a server with multiple GPUs using python3 + pytorch 0.3.1.post2. However, I keep getting this error: ‘RuntimeError: CUDNN_STATUS_INTERNAL_ERROR’. It seems like this error happens when conv3d() is called. The complete error message is below. Could anyone help me out with it?

Thanks

File “main.py”, line 220, in class_reg_eval
outputs = combined_classifier.net(event_data)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/Triforce_CaloML/Architectures/GoogLeNet.py”, line 142, in forward
x = self.pre_layers(x)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 67, in forward
input = module(input)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/conv.py”, line 388, in forward
self.padding, self.dilation, self.groups)
File “/home/junzel2/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py”, line 126, in conv3d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

Updates:
I have tried several solutions I found with Google. I’ve tried ‘rm -rf ~/.nv’, ‘torch.backends.cudnn.benchmark = False’, etc. But they all didn’t work.

Is your code running on CPU?
Also, could you update to the latest release and check it again or do you need this PyTorch version for some reason?

Thanks for your reply.
The code is running on GPUs. I have just updated PyTorch to 0.4.1, and ‘CUDNN_STATUS_INTERNAL_ERROR’ is gone. My GoogLeNet was working fine on 0.3.1 previously, and just went wrong recently. It is a weird error tho.
Thanks for your suggestion and it is runnable now.