Backward calcalution fails with batch size >1 while using cudnn with error CUDNN_STATUS_INTERNAL_ERROR

Ron_Katzir · February 16, 2021, 10:14am

Thanks for replying.

I cleaned up my environment, reinstalled and verified cudnn and reinstalled torch using pip. The code snippet above still fails for me, as is the following code:

import torch
from torch.backends import cudnn
torch.backends.cudnn.benchmark = True
out = torch.randn([2, 4, 3, 360, 640], dtype=torch.float, device=‘cuda’, requires_grad=True)
net = torch.nn.Conv3d(4, 1, kernel_size=[3, 3, 3], padding=[1, 1, 1], stride=[1, 1, 1], dilation=[1, 1, 1]).cuda()
out = net(out)
out.sum().backward()

Setting torch.backends.cudnn.benchmark = False succeeds. Additionally the following code succeeds:

import torch
from torch.backends import cudnn
torch.backends.cudnn.benchmark = True
out = torch.randn([2, 4, 3, 360, 640], dtype=torch.float, device=‘cuda’, requires_grad=True)
out.sum().backward()

I’ve seen similar issues reported in this forum and in github:

github.com/pytorch/pytorch

cuDNN error: CUDNN_STATUS_INTERNAL_ERROR on cudnn.benchmark = True

opened 01:10AM - 07 Feb 19 UTC

closed 02:31PM - 05 May 19 UTC

moderatelyfunctional

module: dependency bug module: cudnn triaged

Hi, I'm running PyTorch on an encoder/decoder architecture and am having a probl…em with cuDNN. If I include ```python import torch.backends.cudnn as cudnn cudnn.benchmark = True ``` in my Python code, then I receive an error for CUDNN_STATUS_INTERNAL_ERROR. The full stack trace is listed here. ``` main() File "main.py", line 152, in main train_model(train_dataloader, val_dataloader) File "main.py", line 131, in train_model network_output = network(network_input) File "/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/model.py", line 133, in forward x = self.decoder(x) File "/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "model.py", line 54, in forward x = self.layer4(x) File "/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 757, in forward output_padding, self.groups, self.dilation) RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR ``` If I comment out cuDNN, I can run the code without any problems. My system configurations are listed below. - PyTorch Version: 1.0.1 - OS: Ubuntu 16.04 - PyTorch 1.0.1 installed from pip3 - Python version: 3.5 - CUDA/cuDNN version: 10.0/7.402 - GPU models and configuration: Nvidia GPU Titan X ## Additional context I've tried `rm -rf ~/.nv`, rebuilt CUDA/cuDNN and reinstalled PyTorch but still cannot get it to work. Thanks for your help!

github.com/pytorch/pytorch

F.conv2d() causes RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

opened 08:30PM - 02 Oct 20 UTC

timothybrooks

module: cudnn triaged

## 🐛 Bug I am receiving a RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ER…ROR when calling F.conv2d(). Shapes and devices of inputs: ``` input, torch.Size([4, 256, 128, 128]), cuda:0 weight, torch.Size([256, 256, 3, 3]), cuda:0 stride, 1 padding, 1 ``` ## To Reproduce ``` File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/timbrooks/code/stylegan2_lightning/models/layers/conv.py", line 51, in forward out = F.conv2d( RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue. torch.backends.cuda.matmul.allow_tf32 = True [11/1806] torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([4, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1, 0], stride=[1, 1, 0], dilation=[1, 1, 0], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize() ConvolutionParams data_type = CUDNN_DATA_FLOAT padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x55660f9da960 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 4, 256, 128, 128, strideA = 4194304, 16384, 128, 1, output: TensorDescriptor 0x55660eed9f50 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 4, 256, 128, 128, strideA = 4194304, 16384, 128, 1, weight: FilterDescriptor 0x55660f9f08c0 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 256, 256, 3, 3, Pointer addresses: input: 0x7e3f86001000 output: 0x7e3f5a000000 weight: 0x7e4074d80000 ``` ## Expected behavior ## Environment I receive the same issue on the stable build and nightly build, and both environments are below. ``` PyTorch version: 1.6.0 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: Quadro RTX 8000 GPU 1: Quadro RTX 8000 GPU 2: Quadro RTX 8000 GPU 3: Quadro RTX 8000 GPU 4: Quadro RTX 8000 GPU 5: Quadro RTX 8000 GPU 6: Quadro RTX 8000 GPU 7: Quadro RTX 8000 GPU 8: Quadro RTX 8000 GPU 9: Quadro RTX 8000 Nvidia driver version: 440.82 cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.19.1 [pip3] pytorch-lightning==0.9.0 [pip3] torch==1.6.0 [pip3] torchvision==0.7.0 [conda] blas 2.16 mkl conda-forge [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] libblas 3.8.0 16_mkl conda-forge [conda] libcblas 3.8.0 16_mkl conda-forge [conda] liblapack 3.8.0 16_mkl conda-forge [conda] liblapacke 3.8.0 16_mkl conda-forge [conda] mkl 2020.2 256 conda-forge [conda] numpy 1.19.1 py38hbc27379_2 conda-forge [conda] pytorch 1.6.0 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch [conda] pytorch-lightning 0.9.0 py_0 conda-forge [conda] torchvision 0.7.0 py38_cu102 pytorch ``` ``` PyTorch version: 1.7.0.dev20201002 Is debug build: True CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: Quadro RTX 8000 GPU 1: Quadro RTX 8000 GPU 2: Quadro RTX 8000 GPU 3: Quadro RTX 8000 GPU 4: Quadro RTX 8000 GPU 5: Quadro RTX 8000 GPU 6: Quadro RTX 8000 GPU 7: Quadro RTX 8000 GPU 8: Quadro RTX 8000 GPU 9: Quadro RTX 8000 Nvidia driver version: 440.82 cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.19.1 [pip3] pytorch-lightning==0.9.0 [pip3] torch==1.7.0.dev20201002 [pip3] torchvision==0.8.0.dev20201002 [conda] blas 2.16 mkl conda-forge [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] libblas 3.8.0 16_mkl conda-forge [conda] libcblas 3.8.0 16_mkl conda-forge [conda] liblapack 3.8.0 16_mkl conda-forge [conda] liblapacke 3.8.0 16_mkl conda-forge [conda] mkl 2020.2 256 conda-forge [conda] numpy 1.19.1 py38hbc27379_2 conda-forge [conda] pytorch 1.7.0.dev20201002 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch-nightly [conda] pytorch-lightning 0.9.0 py_0 conda-forge [conda] torchvision 0.8.0.dev20201002 py38_cu102 pytorch-nightly ``` cc @csarofeen @ptrblck

github.com/microsoft/Bringing-Old-Photos-Back-to-Life

cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

opened 09:10AM - 29 Nov 20 UTC

closed 09:25AM - 02 Dec 20 UTC

wang7143

Running Stage 1: Overall restoration Now you are processing 5d6bfe49ge1b23ec326…41&690.jpg Skip 5d6bfe49ge1b23ec32641&690.jpg due to an error: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue. import torch torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([1, 3, 470, 446], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(3, 64, kernel_size=[7, 7], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize() ConvolutionParams data_type = CUDNN_DATA_FLOAT padding = [0, 0, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x54bd8730 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 1, 3, 470, 446, strideA = 628860, 209620, 446, 1, output: TensorDescriptor 0x54bda2f0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 1, 64, 464, 440, strideA = 13066240, 204160, 440, 1, weight: FilterDescriptor 0x3b058e0 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 64, 3, 7, 7, Pointer addresses: input: 0x60b80c400 output: 0x60be60000 weight: 0x601660000 Skipping non-file final_output Skipping non-file stage_1_restore_output Skipping non-file stage_2_detection_output Skipping non-file stage_3_face_output Now you are processing timg.jpg Skip timg.jpg due to an error: CUDA error: an illegal memory access was encountered Finish Stage 1 ... Is this because of insufficient video memory?

github.com/pytorch/pytorch

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR in 1.1.0

opened 11:28PM - 20 Jun 19 UTC

closed 09:52PM - 24 Sep 19 UTC

Moondra

module: cudnn module: nn module: cuda module: convolution triaged

I am currently using Windows 7, Pytorch version 1.1.0, It seems from what I ha…ve read this error should have been fixed in 1.1.0? Here is the environment: Collecting environment information... PyTorch version: 1.1.0 Is debug build: No CUDA used to build PyTorch: 9.0 OS: Microsoft Windows 7 Professional GCC version: (tdm64-1) 5.1.0 CMake version: version 3.7.0-rc1 Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce GTX 1070 Nvidia driver version: 411.31 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudnn64_7.dll Versions of relevant libraries: [pip3] numpy==1.16.4 [pip3] torch==1.1.0 [pip3] torchsummary==1.5.1 [pip3] torchvision==0.3.0 [conda] Could not collect Here is the traceback: `Traceback (most recent call last): File "C:\Users\Moondra\Desktop\PYTORCH\PYTORCH BASICS\PYTORCH ENVIRONMENTS\PYTORCH STUFF\pytorch_test.py", line 191, in <module> num_epochs=3) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORCH BASICS\PYTORCH ENVIRONMENTS\PYTORCH STUFF\pytorch_test.py", line 52, in train_model outputs = model(inputs) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torchvision\models\densenet.py", line 119, in forward features = self.features(x) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Moondra\Desktop\PYTORCH\PYTORC~1\PYTORC~2\test1\lib\site-packages\torch\nn\modules\conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR` I have multiple versions of CUDA installed for compatibility with Tensorflow, Keras and Pytorch. Thank you. The script I am running is the `transfer learning` script using `Densenet` https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Frankly I’m not sure how to proceed with this and setting benchmark=False is not the solution I’d want.