Cannot for the life of me get PyTorch and CUDA to install/work

AlMason86 · February 14, 2024, 11:53am

I am very new to this so its probably something I am doing wrong. But I cannot get PyTorch installed with Cuda. I thought I did manage it but then there was something wrong with the resulting environment that meant I couldn’t install any other packages!

I have Anaconda UI installed and use the Anaconda Prompt. I made a new environment specially for the CUDA stuff using Python 3.11.

My machine has a Nvidia Quadro RTX A2000 I think. Its a Dell Precision workstation laptop. From what I could figure out compatibility wise, I should be OK with CUDA toolkit 11.8 and cuDNN 9.0 both of which are installed. Though curiously, if i use “nvidia-smi” in the command prompt it tells me the CUDA version is 12.2 (I have never installed that, the driver version is 536.45).

When I go to the PyTorch site and select all the right boxes and run the resulting command it has numerous failed attempts at “Solving Environment” and then just sticks on “Solving Environment: /” with a spinning character. So I am now out of ideas!

Any help would be greatly appreciated.

AlMason86 · February 14, 2024, 12:16pm

After typing all that. it suddenly loaded up a load of packages. hit “Y”. let it work, restarted my machine and now it seems to work???

I have PyTorch 2.2.0 and CUDA 11.8 according to:

torch.__version__

and

torch.version.cuda()

although nvidia-smi still reports 12.2 as the CUDA version?

Maybe this will all work somehow? haha

EDIT: So now I am also back to get a ream of errors when trying to install pacakges like pandas or scikit-learn. The error seems to reference “torchvision-0.15.2-cpupy310h7187fe4_0”.

ptrblck · February 14, 2024, 2:43pm

Your locally CUDA toolkit (including cuDNN, NCCL, and other libs) won’t be used if you install the PyTorch binaries as they ship with all needed CUDA dependencies.
You need to properly install a compatible NVIDIA driver and can just use any install command from here. E.g. pip install torch will install the current torch==2.2.0+cu121 version which ships with CUDA 12.1 runtime dependencies.

That’s the driver and not the locally installed CUDA toolkit to build and develop applications.

Uninstall all other packages as you are trying to mix binaries with and without CUDA support. E.g. torchvision uses the CPU-only version right now in your environment.
If you get stuck, just create a new, clean, and empty virtual environment and run:

pip3 install torch torchvision torchaudio

AlMason86 · February 14, 2024, 6:20pm

Right I will try this.

So to understand right… I am over complicating this? I just need a fairly recent Nvidia driver and then the right combination of pip or conda install commands will take care of it?

EDIT: Also torchvision and torchaudio are perhaps not necessary? I don’t do anything with audio or vision. But I do need recurrent neural networks.

ptrblck · February 14, 2024, 7:20pm

Yes, this is correct. Your locally CUDA toolkit will be used if you build PyTorch from source or a custom CUDA extension. You won’'t need it to execute PyTorch workloads as the binaries (pip wheels and conda binaries) install all needed requirements. You would however need to install an NVIDIA driver to allow the communication with your GPU.

In this case you could skip installing these libs and could stick to PyTorch only.

AlMason86 · February 15, 2024, 2:53pm

hmmm, this has only installed a CPU version.

ptrblck · February 15, 2024, 2:59pm

What does “this” mean?

AlMason86 · February 15, 2024, 7:34pm

It means when I ask torch if there is a cuda processor available it says FALSE.

torch.cuda.is_available() 
= FALSE

I noticed when it was installing packages that it was only selecting the CPU variant of PyTorch.

So I don’t know what to do really, I think I will just leave it for now. There is a new guy from Japan in my office and he seems to know how to get it set up so I will ask him to help me.

ptrblck · February 15, 2024, 8:14pm

Without any information on how you’ve tried to install it, we won’t be able to help.
Creating a new environment and installing PyTorch via pip install torch works fine:

conda create -n test_install python=3.10
...
conda activate test_install
pip install torch
Collecting torch
  Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
...
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 474.8 kB/s eta 0:00:00
Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 755.5/755.5 MB 17.3 MB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 19.2 MB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 20.1 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 19.9 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 28.5 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 19.8 MB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 19.8 MB/s eta 0:00:00
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 19.5 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 19.8 MB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 20.0 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 18.6 MB/s eta 0:00:00
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 91.4 MB/s eta 0:00:00
Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.9/167.9 MB 19.3 MB/s eta 0:00:00
...
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch
Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 sympy-1.12 torch-2.2.0 triton-2.2.0 typing-extensions-4.9.0
...
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
# 2.2.0+cu121
# True

AlMason86 · February 16, 2024, 12:09pm

I was using Python 3.11 maybe this is why?

Anyway… I completely remove all traces of python and VS Code from my machine and then reinstalled python via Anaconda. I made no venvs or anything like that, I will just work in (base) because I am not doing multiple things.

I then just installed PyTorch by the command given by the website when selecting latest versions of everything:

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

It is working now.

Thank you for your help

Bhishan · June 13, 2024, 2:03pm

How did you install cuda-12.1 in windows? I am working on Windows 10 Dell computer, Previously I have successfully install CUDA11.7 but when i try to install cuda 12.1 from CUDA Toolkit 12.1 Update 1 Downloads | NVIDIA Developer

I could not install the exe file. (note that double click does not work, i renamed .exe to .7zip and unzipped and ran ./setup.exe, that install file fails in the end )

Please let me know how did you install cuda 12.1 installer file in windows 10 computer?