PyTorch with CUDA 11 compatibility

Santhosh_Kumar1 · July 15, 2020, 4:32am

Recently, I installed a ubuntu 20.04 on my system. Since it was a fresh install I decided to upgrade all the software to the latest version. So, Installed Nividia driver 450.51.05 version and CUDA 11.0 version. To my surprise, Pytorch for CUDA 11 has not yet been rolled out.

My question is, should I downgrade the CUDA package to 10.2 or go with PyTorch built for CUDA 10.2 without downgrading CUDA itself?. Also, can anyone explain why PyTorch is built differently for various CUDA versions and what changes CUDA between versions?

ptrblck · July 15, 2020, 5:25am

As explained here, the binaries are not built yet with CUDA11. However, the initial CUDA11 enablement PRs are already merged, so that you could install from source using CUDA11.

If you want to use the binaries, you would have to stick to 10.2 for now.

yueqi · July 21, 2020, 4:45am

You probably don’t need to downgrade the CUDA 11 installed in your system. As explained here, conda install pytorch torchvision cudatoolkit=10.2 -c pytorch will install CUDA 10.2 and cudnn binaries within the Conda environment, so the system-installed CUDA 11 will not be used at all.

I recently installed ubuntu 20.04 and Nvidia driver 450. It took me a while to realize that I didn’t have to build pytorch from source just because I have CUDA 11 in my system.

rwightman · July 21, 2020, 8:47pm

Also, if you do actually want to try CUDA 11, easiest way is to make sure you have a sufficiently new driver and run the PyTorch NGC docker container. The latest 20.06 container has PyTorch 1.6, CUDA 11, and cuDNN 8, unfortunately cuDNN is an release candidate with some fairly significant performance regressions right now, not always the best idea to be bleeding edge

Yuheng_Zhi · August 21, 2020, 7:40pm

I was trying to build from source following the steps listed here with cuda 11.0 and a GTX 680 graphics card. But it stopped building due to an error:
nvcc fatal : Unsupported gpu architecture 'compute_30'
Would you mind giving me a bit advice on how to work around this? Much appreciated.

ptrblck · August 21, 2020, 10:41pm

There is unfortunately no workaround for this, as compute capability 3.0 and 3.2 were dropped in CUDA11 and 3.5, 3.7, and 5.0 were deprecated (release notes).

AzimuthLI · September 7, 2020, 12:56pm

Life Saver! Bravo! Thanks!

arogozhnikov · September 29, 2020, 1:10am

Any progress?

Nvidia ships ubuntu 20 dockers only with Cuda 11, and it’s already more than half a year old.

ptrblck · September 29, 2020, 7:52am

You could build from source with CUDA11 (+cudnn8), use the NGC container, or the nightly binaries.
Once 1.7 is code frozen, the stable binaries should be released.

kepler62f · October 27, 2020, 1:28pm

Given above discussion, I’m surprised I was able to use PyTorch version 1.6 while my ubuntu20 host has CUDA version 11. For my setup, I used pytorch in a docker container using python3.8 base image and I pip installed pytorch 1.6 and torchvision 0.7. I did not even install any other cudatoolkit version. To access gpu from container I had nvidia-docker2 container toolkit package on the host. torch.cuda.is_available() returned true.

ptrblck · October 28, 2020, 10:35am

The conda binaries and pip wheels ship with their CUDA (cudnn, NCCL, etc.) runtime, so you don’t need a local CUDA installation to use native PyTorch operations.
However, you would have to install a matching CUDA version, if you want to build PyTorch from source or build custom CUDA extensions.

Ademord · June 16, 2021, 5:13pm

is there any update on this, i have torch ‘1.9.0+cu102’ but my system is cuda11.3

ptrblck · June 16, 2021, 9:17pm

I’m not sure what kind of update you are looking for, but the binaries support CUDA11 since PyTorch 1.7.0 I believe. You can select the corresponding CUDA runtime version by selecting the desired box here.

nullgeppetto · September 17, 2021, 1:08am

@ptrblck What can be done for CUDA 11.4? Are binaries available? Could they be installed in a venv with pip? Many thanks!

ptrblck · September 17, 2021, 4:34am

The nightlies can be used with CUDA11.3. For CUDA11.4 we need to refactor the build pipeline a bit as the cudatoolkit binary was changed and we decided to work on it after the PyTorch 1.10.0 release.
In the meantime you can of course build from source using CUDA 11.4.

nullgeppetto · September 17, 2021, 10:54am

Many thanks for your prompt reply!

addisonklinke · October 27, 2021, 9:52pm

@ptrblck Is there a release roadmap anywhere that details when the PyTorch team is looking to add or deprecate support for specific CUDA versions? For instance, the current installation guide shows 10.2 as the default and 11.3 as an option (with the +cu113 suffix for pip). I am wondering

When will 11.x become the default without having to specify explicitly?
How much longer will 10.x be supported?
Is there an estimate for when >= 12.x support will be added?

Thank you for any references you can provide! This would be helpful for myself and other ML Engineers when choosing appropriate hardware

ptrblck · October 28, 2021, 12:50am

We are working on making CUDA11 the default, but due to the large size of CUDA and cuDNN, it’s not an easy task. As you might have seen the size difference between the wheels shipped with CUDA10.2 vs. 11.3 is huge and we are checking different approaches right now.
See point 1. We hope to be able to come up with a valid approach so that we can release with one setup.
Once CUDA12 is out, we’ll add it (first as a source build option, then in the wheels)

GiantRice · October 29, 2021, 10:30am

hi, @ptrblck
I have some confusion about CUDA compatibility.

We have a NVIDIA A10 GPU,
and nvidia-smi shows ( NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4)

We install pytorch with pip method. below wheel pkgs
torch-1.3.0%2Bcu100-cp37-cp37m-linux_x86_64.whl
torchvision-0.4.1%2Bcu100-cp37-cp37m-linux_x86_64.whl

should torch-1.3.0+cu100 be compatible with my gpu driver version?

When we run below python code, it takes about too long time(more than 8 minutes) to finish. we find cpu 100% using top command. So why does it cost so long time?

import torch
torch.Tensor([1,2]).cuda()

And with some other torch code like torch.relu_(input), it encounters errors ?
RuntimeError: CUDA error: no kernel image is available for execution on the device

ptrblck · October 29, 2021, 10:39am

The used CUDA runtime would be compatible. However, you are using an Ampere GPU which needs CUDA>=11.0.
The CUDA 11 runtime landed in PyTorch 1.7, so you would need to update the PyTorch pip wheels to any version after 1.7 (I would recommend to use the latest one) with the CUDA11 runtime (the current 1.10.0 pip wheels use CUDA11.3).