My network is training without any errors using CUDA. When doing inference i do get randomly a “cuDNN error: CUDNN_STATUS_MAPPING_ERROR” error triggered by torch/nn/modules/conv.py", line 420, in _conv_forward. (called from a standard resnet)
I checked the input and everything looks fine and I also tried to run it on the CPU to maybe get a more helpful error message, but on the CPU it runs as expected and I am not able to reproduce the error.
Any ideas on what I can try to debug this error?
Additional info: I am using pytorch1.7.1 with cuda11.0 and cudnn8.0.5
Traceback:
File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/michael/master/realistic-hands/models/resnet.py", line 186, in forward
x = self.conv1(x)
File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR```
This sounds like an internal cudnn issue.
Could you post the conv layer definition (or complete model) as well as the input shapes, which trigger this error?
Also, which GPU are you using?
Thanks for the information. As self.inplane is not defined I’ve used an arange from 1 to 256 using this code snippet on a 1060:
import torch
import torch.nn as nn
print(torch.cuda.get_device_name())
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
for i in range(1, 257):
conv = nn.Conv2d(3, i, kernel_size=7, stride=2, padding=3, bias=False).cuda()
x = torch.randn(1, 3, 224, 224).cuda()
out = conv(x)
print(out.shape)
The simplest way to fix this issue is by using the right CUDA version (11.1). You can use the pip command below to install CUDA before running your program.