Torch.backends.cudnn.benchmark and RuntimeError: cuDNN error

Hello,

I hope this is the right forum to post my issue.
During the model training I received the following error message with an additional code snippet to reconstruct the error. The code snippet throws the same error.

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

If I set

torch.backends.cudnn.benchmark = False

the error is not triggered. Originally, the error was triggered when I used transforms.RandomCrop(256) for the training data and transforms.RandomCrop(512) for the validation data. With the same crop size the error is not triggered.

I don’t know if this is a bug or if I did something wrong.

BR,
Patrick

Could you check, if you are running out of memory?
If that’s not the case, could you post an executable code snippet as well as the output of:

python -m torch.utils.collect_env

Hi,

thanks for the reply. I´m not running out of memory.

Enviroment information:

Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000

Nvidia driver version: 450.51.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.0
[pip3] torchvision==0.9.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              h8f6ccaa_8    conda-forge
[conda] mkl                       2020.4             h726a3e6_304    conda-forge
[conda] mkl-service               2.3.0            py38h1e0a361_2    conda-forge
[conda] mkl_fft                   1.3.0            py38h5c078b8_1    conda-forge
[conda] mkl_random                1.2.0            py38hc5bc63f_1    conda-forge
[conda] numpy                     1.19.2           py38h54aff64_0
[conda] numpy-base                1.19.2           py38hfa32c7d_0
[conda] pytorch                   1.8.0           py3.8_cuda10.2_cudnn7.6.5_0                                                                                 pytorch
[conda] torchvision               0.9.0                py38_cu102    pytorch

I’m not sure what you mean with executable code snippet, the one provided above is executable and triggers the error. But I wrapped it in a main function:

import torch

def main():
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = False
    torch.backends.cudnn.allow_tf32 = True
    data = torch.randn([1, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
    net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
    net = net.cuda().float()
    out = net(data)
    out.backward(torch.randn_like(out))
    torch.cuda.synchronize()
                         
if __name__ == "__main__":
    main()

Error:

Traceback (most recent call last):
  File "error.py", line 16, in <module>
    main()
  File "error.py", line 11, in main
    out = net(data)
  File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
    data_type = CUDNN_DATA_FLOAT
    padding = [1, 1, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x556f5877abe0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 1, 256, 128, 128,
    strideA = 4194304, 16384, 128, 1,
output: TensorDescriptor 0x556f595a3a20
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 1, 256, 128, 128,
    strideA = 4194304, 16384, 128, 1,
weight: FilterDescriptor 0x556f595fe4b0
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 256, 256, 3, 3,
Pointer addresses:
    input: 0x7ff411c00000
    output: 0x7ff412e40000
    weight: 0x7ff412c00000

Great! Thanks for the update. We’ll check the workload and forward it to cudnn, if applicable.