## 🐛 Bug
I am receiving a RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ER…ROR when calling F.conv2d().
Shapes and devices of inputs:
```
input, torch.Size([4, 256, 128, 128]), cuda:0
weight, torch.Size([256, 256, 3, 3]), cuda:0 stride, 1
padding, 1
```
## To Reproduce
```
File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/timbrooks/anaconda3/envs/stylegan2_lightning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/timbrooks/code/stylegan2_lightning/models/layers/conv.py", line 51, in forward
out = F.conv2d(
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
torch.backends.cuda.matmul.allow_tf32 = True [11/1806]
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([4, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1, 0], stride=[1, 1, 0], dilation=[1, 1, 0], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x55660f9da960
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 4, 256, 128, 128,
strideA = 4194304, 16384, 128, 1,
output: TensorDescriptor 0x55660eed9f50
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 4, 256, 128, 128,
strideA = 4194304, 16384, 128, 1,
weight: FilterDescriptor 0x55660f9f08c0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 256, 256, 3, 3,
Pointer addresses:
input: 0x7e3f86001000
output: 0x7e3f5a000000
weight: 0x7e4074d80000
```
## Expected behavior
## Environment
I receive the same issue on the stable build and nightly build, and both environments are below.
```
PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000
GPU 2: Quadro RTX 8000
GPU 3: Quadro RTX 8000
GPU 4: Quadro RTX 8000
GPU 5: Quadro RTX 8000
GPU 6: Quadro RTX 8000
GPU 7: Quadro RTX 8000
GPU 8: Quadro RTX 8000
GPU 9: Quadro RTX 8000
Nvidia driver version: 440.82
cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] pytorch-lightning==0.9.0
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0
[conda] blas 2.16 mkl conda-forge
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] libblas 3.8.0 16_mkl conda-forge
[conda] libcblas 3.8.0 16_mkl conda-forge
[conda] liblapack 3.8.0 16_mkl conda-forge
[conda] liblapacke 3.8.0 16_mkl conda-forge
[conda] mkl 2020.2 256 conda-forge
[conda] numpy 1.19.1 py38hbc27379_2 conda-forge
[conda] pytorch 1.6.0 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] pytorch-lightning 0.9.0 py_0 conda-forge
[conda] torchvision 0.7.0 py38_cu102 pytorch
```
```
PyTorch version: 1.7.0.dev20201002
Is debug build: True
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000
GPU 2: Quadro RTX 8000
GPU 3: Quadro RTX 8000
GPU 4: Quadro RTX 8000
GPU 5: Quadro RTX 8000
GPU 6: Quadro RTX 8000
GPU 7: Quadro RTX 8000
GPU 8: Quadro RTX 8000
GPU 9: Quadro RTX 8000
Nvidia driver version: 440.82
cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] pytorch-lightning==0.9.0
[pip3] torch==1.7.0.dev20201002
[pip3] torchvision==0.8.0.dev20201002
[conda] blas 2.16 mkl conda-forge
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] libblas 3.8.0 16_mkl conda-forge
[conda] libcblas 3.8.0 16_mkl conda-forge
[conda] liblapack 3.8.0 16_mkl conda-forge
[conda] liblapacke 3.8.0 16_mkl conda-forge
[conda] mkl 2020.2 256 conda-forge
[conda] numpy 1.19.1 py38hbc27379_2 conda-forge
[conda] pytorch 1.7.0.dev20201002 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch-nightly
[conda] pytorch-lightning 0.9.0 py_0 conda-forge
[conda] torchvision 0.8.0.dev20201002 py38_cu102 pytorch-nightly
```
cc @csarofeen @ptrblck