Hi
I’m at a loss with this issue for some time now and it’s blocking my research.
On a certain dataset I use, the loss.backward calculation fails with the error below. It happens only when using cudnn, with a batch size > 1 and on nvidia rtx 20xx cards. With 1080 cards everything works fine, also when I use a different dataset or set batch size to be 1 or disable cudnn.
I’m using ubuntu 20.04, cuda 11.2 and cudnn 8.0.
I’ve seen similar issues in the forum, without solutions.
Thanks for any help
- Error log (with CUDA_LAUNCH_BLOCKING=1):
, in train
loss_sum.backward()
File “/external/conda/lib/python3.8/site-packages/torch/tensor.py”, line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/external/conda/lib/python3.8/site-packages/torch/autograd/init.py”, line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn’t trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([2, 4, 5, 360, 640], dtype=torch.float, device=‘cuda’, requires_grad=True)
net = torch.nn.Conv3d(4, 1, kernel_size=[3, 3, 3], padding=[1, 1, 1], stride=[1, 1, 1], dilation=[1, 1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 1]
stride = [1, 1, 1]
dilation = [1, 1, 1]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x55e8cdd6dc30
type = CUDNN_DATA_FLOAT
nbDims = 5
dimA = 2, 4, 5, 360, 640,
strideA = 4608000, 1152000, 230400, 640, 1,
output: TensorDescriptor 0x7f0ea8015430
type = CUDNN_DATA_FLOAT
nbDims = 5
dimA = 2, 1, 5, 360, 640,
strideA = 1152000, 1152000, 230400, 640, 1,
weight: FilterDescriptor 0x7f0ea80410d0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 5
dimA = 1, 4, 3, 3, 3,
Pointer addresses:
input: 0x7f0eb2000000
output: 0x7f0ed0c00000
weight: 0x7f0fb9bff800