Pytorch getting error with valid cuDNN

While running a pytorch script on a cluster, I’m getting the following error:

Traceback (most recent call last):
File “/global/u2/a/anshuman/StructRepGen_Dev/diff_gpu.py”, line 681, in
z , z_mu, z_var = unet(batch_noisy, t)
File “/global/homes/a/anshuman/.conda/envs/srg/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/global/u2/a/anshuman/StructRepGen_Dev/diff_gpu.py”, line 560, in forward
x = self.init_conv(x)
File “/global/homes/a/anshuman/.conda/envs/srg/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/global/homes/a/anshuman/.conda/envs/srg/lib/python3.9/site-packages/torch/nn/modules/conv.py”, line 307, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/global/homes/a/anshuman/.conda/envs/srg/lib/python3.9/site-packages/torch/nn/modules/conv.py”, line 303, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

I saw a post related to this error, where it was mentioned about the mismatch between cuda version and pytorch version. I tried to re-install the pytorch version but still the problem persists. Do I need to explicitly give the cuda path and the pytorch path? If so, then how may I do so? Thanks.

I have the following versions:

torch.version = 1.12.1
torch.version.cuda = 11.3
torch.backends.cudnn.version() = 8302

path of cuda ( which nvcc )=

/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/bin/nvcc

cuda version on terminal = nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Your locally installed CUDA toolkit and cuDNN won’t be used if you’ve installed the PyTorch binaries as they ship with their own CUDA dependencies. Could you update PyTorch to the latest version and check if you would still hit the error?

Surprisingly it worked when I re-installed it with cuda 11. Thanks.

1 Like