cuDNN error: CUDNN_STATUS_MAPPING_ERROR raised by Conv2d

Hi,

My network is training without any errors using CUDA. When doing inference i do get randomly a “cuDNN error: CUDNN_STATUS_MAPPING_ERROR” error triggered by torch/nn/modules/conv.py", line 420, in _conv_forward. (called from a standard resnet)

I checked the input and everything looks fine and I also tried to run it on the CPU to maybe get a more helpful error message, but on the CPU it runs as expected and I am not able to reproduce the error.

Any ideas on what I can try to debug this error?

Additional info: I am using pytorch1.7.1 with cuda11.0 and cudnn8.0.5
Traceback:

  File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/michael/master/realistic-hands/models/resnet.py", line 186, in forward
    x = self.conv1(x)
  File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/michael/miniconda3/envs/realistic-hands/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR```

This sounds like an internal cudnn issue.
Could you post the conv layer definition (or complete model) as well as the input shapes, which trigger this error?
Also, which GPU are you using?

1 Like

The model is nearly identical to the torchvision resnet code. The error is raised at the first convolution of the ResNet model, namely:

x = self.conv1(x)

which is defined as:

self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)

The input shape is (1, 224, 224, 3).
I use a GTX Titan X, but also tested a GTX1060 with the same result.

Thanks for the information. As self.inplane is not defined I’ve used an arange from 1 to 256 using this code snippet on a 1060:

import torch
import torch.nn as nn

print(torch.cuda.get_device_name())
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())

for i in range(1, 257):
    conv = nn.Conv2d(3, i, kernel_size=7, stride=2, padding=3, bias=False).cuda()
    x = torch.randn(1, 3, 224, 224).cuda()

    out = conv(x)
    print(out.shape)

Output:

GeForce GTX 1060 6GB
1.7.1
11.0
8005
torch.Size([1, 1, 112, 112])
torch.Size([1, 2, 112, 112])
torch.Size([1, 3, 112, 112])
torch.Size([1, 4, 112, 112])
torch.Size([1, 5, 112, 112])
...
torch.Size([1, 256, 112, 112])

so I’m unfortunately not able to reproduce this issue.
I assume you’ve tested your code and it’s indeed failing using this particular convolution?

1 Like

Thanks for trying to reproduce the error!
I was able to fix the error by reinstalling cuda and cudnn :slight_smile: