Cuda versioning and pytorch compatibility

prammmoe · May 16, 2024, 9:20am

Hi @ptrblck , I have same issue with cuda drivers compatibility with the pytorch version. My cuda drivers is 11.4 and the ones that bundled in PyTorch is 2.0.1+cu117 so it means it is cuda 11.7. I need a suggestion whether should I downgrade my PyTorch version or install the latest cuda version?

I’m using it to train my yolov9 model and I’m running on NVIDIA GeForce RTX 2060 SUPER. The incompatibility caused my model to have RuntimeError even before the the first epoch.

Thanks in advance.

ptrblck · May 16, 2024, 12:48pm

As explained before:

Gabriel_Dmm · October 7, 2024, 2:55am

Hi @ptrblck , I wanted to ask a question which I hope you would be able to help with regarding conda, pytorch, and cuda. For the past few days, I’ve been trying to connect my jupyter notebook on google colab to my local runtime and I’ve had issues with trying to use PyTorch with CUDA, getting the error below when I run the code.

However, when I run '!nvidia-smi" to check if I have available GPUs, it shows that there are available GPUs, just that they are not accessible by PyTorch.

Running the notebook on the server is based on my conda environment, in which I installed relevant libraries using this command:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 cudatoolkit=11.8 cudnn=8.9 -c pytorch -c nvidia

The reason why I specified pytorch-cuda=11.8 is because I saw that it should follow the exact cudatoolkit version, which is 11.8 for the latest conda release and cudnn=8.9 for that specific cudatoolkit version.

Did I miss any steps or is there something that I need to reconfigure?

ptrblck · October 7, 2024, 11:46am

I don’t know where you’ve read to add cudatoolkit etc. to the command, but you should stick to the install instructions given on our website.

The issue you are seeing is caused by a CPU-only binary as indicated in the version tag 2.4.1+cpu.

Gabriel_Dmm · October 7, 2024, 1:03pm

Thanks for the reply.

I added cudatoolkit to the command because I saw a post in StackOverflow of how to use PyTorch with cuda on conda environment. Just now, I tried to create a new Conda environment and used the specific install instructions to start locally: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Somehow it still shows the same error as above, as I can’t use PyTorch with CUDA. I didn’t install the CUDA toolkit or anything else beforehand. Assuming all that I did was installing conda and running the above code to install PyTorch and I want to use the resources in a Colab Jupyter notebook, do I need to install the CUDA toolkit beforehand or anything else for CUDA support?

ptrblck · October 7, 2024, 10:49pm

The install instructions work for me in a new and clean conda environment:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
...
The following NEW packages will be INSTALLED:

  blas               pkgs/main/linux-64::blas-1.0-mkl 
  brotli-python      pkgs/main/linux-64::brotli-python-1.0.9-py310h6a678d5_8 
  certifi            pkgs/main/linux-64::certifi-2024.8.30-py310h06a4308_0 
  charset-normalizer pkgs/main/noarch::charset-normalizer-3.3.2-pyhd3eb1b0_0 
  cuda-cudart        nvidia/linux-64::cuda-cudart-11.8.89-0 
  cuda-cupti         nvidia/linux-64::cuda-cupti-11.8.87-0 
  cuda-libraries     nvidia/linux-64::cuda-libraries-11.8.0-0 
  cuda-nvrtc         nvidia/linux-64::cuda-nvrtc-11.8.89-0 
  cuda-nvtx          nvidia/linux-64::cuda-nvtx-11.8.86-0 
  cuda-runtime       nvidia/linux-64::cuda-runtime-11.8.0-0 
  cuda-version       nvidia/noarch::cuda-version-12.6-3 
  ffmpeg             pytorch/linux-64::ffmpeg-4.3-hf484d3e_0 
  filelock           pkgs/main/linux-64::filelock-3.13.1-py310h06a4308_0 
  freetype           pkgs/main/linux-64::freetype-2.12.1-h4a9f257_0 
  gmp                pkgs/main/linux-64::gmp-6.2.1-h295c915_3 
  gmpy2              pkgs/main/linux-64::gmpy2-2.1.2-py310heeb90bb_0 
  gnutls             pkgs/main/linux-64::gnutls-3.6.15-he1e5248_0 
  idna               pkgs/main/linux-64::idna-3.7-py310h06a4308_0 
  intel-openmp       pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46306 
  jinja2             pkgs/main/linux-64::jinja2-3.1.4-py310h06a4308_0 
  jpeg               pkgs/main/linux-64::jpeg-9e-h5eee18b_3 
  lame               pkgs/main/linux-64::lame-3.100-h7b6447c_0 
  lcms2              pkgs/main/linux-64::lcms2-2.12-h3be6417_0 
  lerc               pkgs/main/linux-64::lerc-3.0-h295c915_0 
  libcublas          nvidia/linux-64::libcublas-11.11.3.6-0 
  libcufft           nvidia/linux-64::libcufft-10.9.0.58-0 
  libcufile          nvidia/linux-64::libcufile-1.11.1.6-0 
  libcurand          nvidia/linux-64::libcurand-10.3.7.77-0 
  libcusolver        nvidia/linux-64::libcusolver-11.4.1.48-0 
  libcusparse        nvidia/linux-64::libcusparse-11.7.5.86-0 
  libdeflate         pkgs/main/linux-64::libdeflate-1.17-h5eee18b_1 
  libiconv           pkgs/main/linux-64::libiconv-1.16-h5eee18b_3 
  libidn2            pkgs/main/linux-64::libidn2-2.3.4-h5eee18b_0 
  libjpeg-turbo      pytorch/linux-64::libjpeg-turbo-2.0.0-h9bf148f_0 
  libnpp             nvidia/linux-64::libnpp-11.8.0.86-0 
  libnvjpeg          nvidia/linux-64::libnvjpeg-11.9.0.86-0 
  libpng             pkgs/main/linux-64::libpng-1.6.39-h5eee18b_0 
  libtasn1           pkgs/main/linux-64::libtasn1-4.19.0-h5eee18b_0 
  libtiff            pkgs/main/linux-64::libtiff-4.5.1-h6a678d5_0 
  libunistring       pkgs/main/linux-64::libunistring-0.9.10-h27cfd23_0 
  libwebp-base       pkgs/main/linux-64::libwebp-base-1.3.2-h5eee18b_0 
  llvm-openmp        pkgs/main/linux-64::llvm-openmp-14.0.6-h9e868ea_0 
  lz4-c              pkgs/main/linux-64::lz4-c-1.9.4-h6a678d5_1 
  markupsafe         pkgs/main/linux-64::markupsafe-2.1.3-py310h5eee18b_0 
  mkl                pkgs/main/linux-64::mkl-2023.1.0-h213fc3f_46344 
  mkl-service        pkgs/main/linux-64::mkl-service-2.4.0-py310h5eee18b_1 
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.10-py310h5eee18b_0 
  mkl_random         pkgs/main/linux-64::mkl_random-1.2.7-py310h1128e8f_0 
  mpc                pkgs/main/linux-64::mpc-1.1.0-h10f8cd9_1 
  mpfr               pkgs/main/linux-64::mpfr-4.0.2-hb69a4c5_1 
  mpmath             pkgs/main/linux-64::mpmath-1.3.0-py310h06a4308_0 
  nettle             pkgs/main/linux-64::nettle-3.7.3-hbbd107a_1 
  networkx           pkgs/main/linux-64::networkx-3.2.1-py310h06a4308_0 
  numpy              pkgs/main/linux-64::numpy-2.0.1-py310h5f9d8c6_1 
  numpy-base         pkgs/main/linux-64::numpy-base-2.0.1-py310hb5e798b_1 
  openh264           pkgs/main/linux-64::openh264-2.1.1-h4ff587b_0 
  openjpeg           pkgs/main/linux-64::openjpeg-2.5.2-he7f1fd0_0 
  pillow             pkgs/main/linux-64::pillow-10.4.0-py310h5eee18b_0 
  pysocks            pkgs/main/linux-64::pysocks-1.7.1-py310h06a4308_0 
  pytorch            pytorch/linux-64::pytorch-2.4.1-py3.10_cuda11.8_cudnn9.1.0_0 
  pytorch-cuda       pytorch/linux-64::pytorch-cuda-11.8-h7e8668a_5 
  pytorch-mutex      pytorch/noarch::pytorch-mutex-1.0-cuda 
  pyyaml             pkgs/main/linux-64::pyyaml-6.0.1-py310h5eee18b_0 
  requests           pkgs/main/linux-64::requests-2.32.3-py310h06a4308_0 
  sympy              pkgs/main/linux-64::sympy-1.13.2-py310h06a4308_0 
  tbb                pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0 
  torchaudio         pytorch/linux-64::torchaudio-2.4.1-py310_cu118 
  torchtriton        pytorch/linux-64::torchtriton-3.0.0-py310 
  torchvision        pytorch/linux-64::torchvision-0.19.1-py310_cu118 
  typing_extensions  pkgs/main/linux-64::typing_extensions-4.11.0-py310h06a4308_0 
  urllib3            pkgs/main/linux-64::urllib3-2.2.3-py310h06a4308_0 
  yaml               pkgs/main/linux-64::yaml-0.2.5-h7b6447c_0 
  zstd               pkgs/main/linux-64::zstd-1.5.5-hc292b87_2 
...
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.randn(1).cuda())"
2.4.1                                                                                                                                                                                                              
11.8                                                                                                                                                                                                               
True                                                                                                                                                                                                               
tensor([0.8143], device='cuda:0')

Make sure to have a single PyTorch binary installed and that torch.version.cuda is returning 11.8 in your case.

You don’t need to install a full CUDA toolkit (only the NVIDIA driver) as PyTorch binaries ship with all needed CUDA runtime dependencies.

Hamxarf · October 30, 2024, 4:46am

Hey there! I’m having trouble setting up cuda in my Visual studio. I was previously able to do it but during testing, i reinstalled PyToch and it caused pytorch to shift back to cpu. I have a rtx 3070ti and I installed Pytorch:torch-2.4.1+cpu-cp312-cp312-win_amd64.whl and the version of my cuda 12.4.1 on a python Python 3.12.5. I added all the environment variables. This screen shot shows you that. However, when I try to check
import torch
print(torch.version)
it says: 2.4.1+cpu

Please help me fix my problem!

Also here is my cuda specifications when i do nvcc --version:

ptrblck · October 30, 2024, 10:38pm

You’ve installed the CPU-only binary and should install a CUDA-enabled one instead. Select the desired CUDA version in our install matrix, copy/paste the command into your terminal, and execute it to install PyTorch with CUDA support.

Your locally installed CUDA toolkit won’t be used as PyTorch binaries ship with their own CUDA runtime dependencies (assuming the CUDA-enabled PyTorch binaries were installed).

deathcrush · November 21, 2024, 9:33pm

@ptrblck I was reading your (super helpful) posts while trying to understand a PyTorch performance issue. I am running inference on A100 with FlanT5 using the transformers library. They take a dependency on the latest torch and I ended up with PyTorch 2.5.1. I realised that my inference was a lot slower on A100 than RTX3090 despite running much larger batch sizes so I downgraded to 1.13.1+cu117. The nvidia driver installed (550.127.05) supports cuda up to 12.4 so that should be fine. However, when I run echo $LD_LIBRARY_PATH | grep cuda I can see the cuda toolkit is 11.4. From your answers above I understand this shouldn’t make a difference, but I’m scratching the head hard to understand why the same code runs a lot slower on 2.5.1 vs 1.13 … Any ideas?

ptrblck · November 21, 2024, 11:08pm

Your locally installed CUDA toolkit won’t be used if you’ve installed the PyTorch binaries as they ship with their own CUDA runtime dependencies. It’s hard to say which part of your code is slower in 2.5.1 compared to 1.13.1 as the former was released on Oct. 29th, 2024 while the latter binary was released on Dec. 15, 2022, so a lot of code has changed. You could profile both runs via e.g. Nsight Systems to narrow down which part of the code is slower.

shravanpi · November 24, 2024, 4:27pm

What should I use?