Hi, I am sorry for repeating this issue which has been posted here many time before. I am getting the following error:
File "/home/sd/anaconda3/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
I have tried the previous answers here RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED - #2 by ptrblck. But when trying to install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0, I get the following error:
ERROR: No matching distribution found for torch==1.8.0+cu111
To make sure that I am using compatible versions of all the packages, I am listing it below.
python: 3.10.9
cuda compilation tools: 10.1
torch: 2.0.0+cu117
Also, here is the GPU details:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN RTX Off | 00000000:03:00.0 Off | N/A |
| 41% 41C P8 24W / 280W | 0MiB / 24217MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro K620 Off | 00000000:A1:00.0 Off | N/A |
| 48% 59C P0 3W / 30W | 450MiB / 2002MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 151344 C python 447MiB |
±----------------------------------------------------------------------------+
Is there anything wrong with the versions which are causing this error? Any help is very much appreciated.
Could you post a minimal and executable code snippet reproducing the issue, please?
Also, it seems you are using your Quadro K620
which has only ~2GB of memory in stead of the TITAN RTX
with ~24GB. In this case you could easily run out of memory, which could also raise this error message if cuDNN fails to initialize its handle.
HI, thanks for the reply. I do not have a minimal code snippet reproducing the error, but I am getting the error just by running the exact code in here Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.0.0+cu117 documentation
May be you can just copy the code run it for a check, if not a problem.
The code works for me using torch==2.0.0+cu118
on a 3090 and I still think you might be running out of memory on the K620
as also nvidia-smi
indicates a Python process is running on this GPU.
Hi, the code has the following line:
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
which means TITAN RTX will be used, isn"t it?
However, I have changed it to “cuda:1” and the error still appears.
Also, conda list
gives me the following packages:
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cublas-cu12 12.1.0.26 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.55 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cudnn-cu12 8.9.0.131 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
Is the error occurring because two different versions are present?
Yes, this could be the case. How did you install PyTorch and did you manually install any nvidia-*
packages? Note that this use case is not supported as it can easily break your environment.
Especially since you are now mixing libraries coming from two different CUDA major releases (11 vs 12).
Actually when I got access to the GPU, pytorch was already installed in it. I think I may have accidentally installed some other versions of nvidia-* while installing some other package. Can you please suggest which of the above packages should I remove?
I would uninstall all PyTorch and nvidia-*
packages and install a single binary with the desired CUDA version. Alternatively, you could also create a new and empty virtual environment and install PyTorch there.
Thank you, I installed pytorch in a new environment and it works now.
Good to hear it’s working now and thanks for the update.