Hi @ptrblck , I have same issue with cuda drivers compatibility with the pytorch version. My cuda drivers is 11.4 and the ones that bundled in PyTorch is 2.0.1+cu117 so it means it is cuda 11.7. I need a suggestion whether should I downgrade my PyTorch version or install the latest cuda version?
I’m using it to train my yolov9 model and I’m running on NVIDIA GeForce RTX 2060 SUPER. The incompatibility caused my model to have RuntimeError even before the the first epoch.
Hi @ptrblck , I wanted to ask a question which I hope you would be able to help with regarding conda, pytorch, and cuda. For the past few days, I’ve been trying to connect my jupyter notebook on google colab to my local runtime and I’ve had issues with trying to use PyTorch with CUDA, getting the error below when I run the code.
However, when I run '!nvidia-smi" to check if I have available GPUs, it shows that there are available GPUs, just that they are not accessible by PyTorch.
Running the notebook on the server is based on my conda environment, in which I installed relevant libraries using this command:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 cudatoolkit=11.8 cudnn=8.9 -c pytorch -c nvidia
The reason why I specified pytorch-cuda=11.8 is because I saw that it should follow the exact cudatoolkit version, which is 11.8 for the latest conda release and cudnn=8.9 for that specific cudatoolkit version.
Did I miss any steps or is there something that I need to reconfigure?
I added cudatoolkit to the command because I saw a post in StackOverflow of how to use PyTorch with cuda on conda environment. Just now, I tried to create a new Conda environment and used the specific install instructions to start locally: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Somehow it still shows the same error as above, as I can’t use PyTorch with CUDA. I didn’t install the CUDA toolkit or anything else beforehand. Assuming all that I did was installing conda and running the above code to install PyTorch and I want to use the resources in a Colab Jupyter notebook, do I need to install the CUDA toolkit beforehand or anything else for CUDA support?
Hey there! I’m having trouble setting up cuda in my Visual studio. I was previously able to do it but during testing, i reinstalled PyToch and it caused pytorch to shift back to cpu. I have a rtx 3070ti and I installed Pytorch:torch-2.4.1+cpu-cp312-cp312-win_amd64.whl and the version of my cuda 12.4.1 on a python Python 3.12.5. I added all the environment variables. This screen shot shows you that. However, when I try to check
import torch
print(torch.version)
it says: 2.4.1+cpu
Please help me fix my problem!
Also here is my cuda specifications when i do nvcc --version:
You’ve installed the CPU-only binary and should install a CUDA-enabled one instead. Select the desired CUDA version in our install matrix, copy/paste the command into your terminal, and execute it to install PyTorch with CUDA support.
Your locally installed CUDA toolkit won’t be used as PyTorch binaries ship with their own CUDA runtime dependencies (assuming the CUDA-enabled PyTorch binaries were installed).
@ptrblck I was reading your (super helpful) posts while trying to understand a PyTorch performance issue. I am running inference on A100 with FlanT5 using the transformers library. They take a dependency on the latest torch and I ended up with PyTorch 2.5.1. I realised that my inference was a lot slower on A100 than RTX3090 despite running much larger batch sizes so I downgraded to 1.13.1+cu117. The nvidia driver installed (550.127.05) supports cuda up to 12.4 so that should be fine. However, when I run echo $LD_LIBRARY_PATH | grep cuda I can see the cuda toolkit is 11.4. From your answers above I understand this shouldn’t make a difference, but I’m scratching the head hard to understand why the same code runs a lot slower on 2.5.1 vs 1.13 … Any ideas?
Your locally installed CUDA toolkit won’t be used if you’ve installed the PyTorch binaries as they ship with their own CUDA runtime dependencies. It’s hard to say which part of your code is slower in 2.5.1 compared to 1.13.1 as the former was released on Oct. 29th, 2024 while the latter binary was released on Dec. 15, 2022, so a lot of code has changed. You could profile both runs via e.g. Nsight Systems to narrow down which part of the code is slower.