Hi,
thanks for the reply. I´m not running out of memory.
Enviroment information:
Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000
Nvidia driver version: 450.51.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.0
[pip3] torchvision==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] mkl 2020.4 h726a3e6_304 conda-forge
[conda] mkl-service 2.3.0 py38h1e0a361_2 conda-forge
[conda] mkl_fft 1.3.0 py38h5c078b8_1 conda-forge
[conda] mkl_random 1.2.0 py38hc5bc63f_1 conda-forge
[conda] numpy 1.19.2 py38h54aff64_0
[conda] numpy-base 1.19.2 py38hfa32c7d_0
[conda] pytorch 1.8.0 py3.8_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchvision 0.9.0 py38_cu102 pytorch
I’m not sure what you mean with executable code snippet, the one provided above is executable and triggers the error. But I wrapped it in a main function:
import torch
def main():
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
if __name__ == "__main__":
main()
Error:
Traceback (most recent call last):
File "error.py", line 16, in <module>
main()
File "error.py", line 11, in main
out = net(data)
File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File ".../anaconda3/envs/TestTorch18/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 256, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x556f5877abe0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 256, 128, 128,
strideA = 4194304, 16384, 128, 1,
output: TensorDescriptor 0x556f595a3a20
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 256, 128, 128,
strideA = 4194304, 16384, 128, 1,
weight: FilterDescriptor 0x556f595fe4b0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 256, 256, 3, 3,
Pointer addresses:
input: 0x7ff411c00000
output: 0x7ff412e40000
weight: 0x7ff412c00000