Recently, I installed a ubuntu 20.04 on my system. Since it was a fresh install I decided to upgrade all the software to the latest version. So, Installed Nividia driver 450.51.05 version and CUDA 11.0 version. To my surprise, Pytorch for CUDA 11 has not yet been rolled out.
My question is, should I downgrade the CUDA package to 10.2 or go with PyTorch built for CUDA 10.2 without downgrading CUDA itself?. Also, can anyone explain why PyTorch is built differently for various CUDA versions and what changes CUDA between versions?
As explained here, the binaries are not built yet with CUDA11. However, the initial CUDA11 enablement PRs are already merged, so that you could install from source using CUDA11.
If you want to use the binaries, you would have to stick to 10.2 for now.
You probably don’t need to downgrade the CUDA 11 installed in your system. As explained here, conda install pytorch torchvision cudatoolkit=10.2 -c pytorch will install CUDA 10.2 and cudnn binaries within the Conda environment, so the system-installed CUDA 11 will not be used at all.
I recently installed ubuntu 20.04 and Nvidia driver 450. It took me a while to realize that I didn’t have to build pytorch from source just because I have CUDA 11 in my system.
Also, if you do actually want to try CUDA 11, easiest way is to make sure you have a sufficiently new driver and run the PyTorch NGC docker container. The latest 20.06 container has PyTorch 1.6, CUDA 11, and cuDNN 8, unfortunately cuDNN is an release candidate with some fairly significant performance regressions right now, not always the best idea to be bleeding edge
I was trying to build from source following the steps listed here with cuda 11.0 and a GTX 680 graphics card. But it stopped building due to an error: nvcc fatal : Unsupported gpu architecture 'compute_30'
Would you mind giving me a bit advice on how to work around this? Much appreciated.
There is unfortunately no workaround for this, as compute capability 3.0 and 3.2 were dropped in CUDA11 and 3.5, 3.7, and 5.0 were deprecated (release notes).
You could build from source with CUDA11 (+cudnn8), use the NGC container, or the nightly binaries.
Once 1.7 is code frozen, the stable binaries should be released.
Given above discussion, I’m surprised I was able to use PyTorch version 1.6 while my ubuntu20 host has CUDA version 11. For my setup, I used pytorch in a docker container using python3.8 base image and I pip installed pytorch 1.6 and torchvision 0.7. I did not even install any other cudatoolkit version. To access gpu from container I had nvidia-docker2 container toolkit package on the host. torch.cuda.is_available() returned true.
The conda binaries and pip wheels ship with their CUDA (cudnn, NCCL, etc.) runtime, so you don’t need a local CUDA installation to use native PyTorch operations.
However, you would have to install a matching CUDA version, if you want to build PyTorch from source or build custom CUDA extensions.
I’m not sure what kind of update you are looking for, but the binaries support CUDA11 since PyTorch 1.7.0 I believe. You can select the corresponding CUDA runtime version by selecting the desired box here.
The nightlies can be used with CUDA11.3. For CUDA11.4 we need to refactor the build pipeline a bit as the cudatoolkit binary was changed and we decided to work on it after the PyTorch 1.10.0 release.
In the meantime you can of course build from source using CUDA 11.4.
@ptrblck Is there a release roadmap anywhere that details when the PyTorch team is looking to add or deprecate support for specific CUDA versions? For instance, the current installation guide shows 10.2 as the default and 11.3 as an option (with the +cu113 suffix for pip). I am wondering
When will 11.x become the default without having to specify explicitly?
How much longer will 10.x be supported?
Is there an estimate for when >= 12.x support will be added?
Thank you for any references you can provide! This would be helpful for myself and other ML Engineers when choosing appropriate hardware
We are working on making CUDA11 the default, but due to the large size of CUDA and cuDNN, it’s not an easy task. As you might have seen the size difference between the wheels shipped with CUDA10.2 vs. 11.3 is huge and we are checking different approaches right now.
See point 1. We hope to be able to come up with a valid approach so that we can release with one setup.
Once CUDA12 is out, we’ll add it (first as a source build option, then in the wheels)
hi, @ptrblck
I have some confusion about CUDA compatibility.
We have a NVIDIA A10 GPU,
and nvidia-smi shows ( NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4)
We install pytorch with pip method. below wheel pkgs torch-1.3.0%2Bcu100-cp37-cp37m-linux_x86_64.whl torchvision-0.4.1%2Bcu100-cp37-cp37m-linux_x86_64.whl
should torch-1.3.0+cu100 be compatible with my gpu driver version?
When we run below python code, it takes about too long time(more than 8 minutes) to finish. we find cpu 100% using top command. So why does it cost so long time?
import torch
torch.Tensor([1,2]).cuda()
And with some other torch code like torch.relu_(input), it encounters errors ? RuntimeError: CUDA error: no kernel image is available for execution on the device
The used CUDA runtime would be compatible. However, you are using an Ampere GPU which needs CUDA>=11.0.
The CUDA 11 runtime landed in PyTorch 1.7, so you would need to update the PyTorch pip wheels to any version after 1.7 (I would recommend to use the latest one) with the CUDA11 runtime (the current 1.10.0 pip wheels use CUDA11.3).