Lowest version of PyTorch supporting CUDA 11.8?

davidw82 · December 14, 2022, 8:26pm

Hi,
What is the lowest version number of PyTorch that started supporting CUDA 11.8? Do I need to go all the way up to PyTorch 1.13 to get support?
And to be clear, I’m always compiling from source.

Cheers,
David

ptrblck · December 15, 2022, 1:40am

I don’t see any CUDA 11.8 - specific changes to enable source builds (besides enabling the Hopper architecture, if needed) and we were using a 1.13.0-prerelease in our first 11.8 container as seen here.

davidw82 · December 15, 2022, 11:39am

Thank you @ptrblck , and also thanks for providing the link to the PyTorch containers.

nicolaspanel · December 15, 2022, 11:55am

thanks @ptrblck for the link
It seems that upgrading to cuda 11.8 breaks DDP (at least for ADA arch or RTX 4090 serie)
Are you aware of this issue ?
Do you know a workaround other than training on a single GPU ?
Best

ptrblck · December 15, 2022, 6:20pm

I doubt CUDA 11.8 breaks DDP workloads as I haven’t seen any failures so far with it.
Check a few similar issues posted in this discussion board and post the missing information about your workflow, environment, as well as the log outputs using the debug flags.

nicolaspanel · December 18, 2022, 10:42am

thanks @ptrblck
After further investigation the problem was due to NCCL backend trying to use peer to peer (P2P) transport.
Forcing NCCL_P2P_DISABLE=1 fixed the issue

akshu2023 · January 29, 2023, 11:07pm

Seems to be a known issue. Unsure who is addressing this between nvidia and amd.

milenna89 · May 22, 2023, 8:58am

How can I install torch 1.13.1 or anything lower than torch 2.0 when I have cuda 11.8 if I want to do that without installing from source with one of the commands here? there’s only one example for cuda 11.8 but it is given for torch 2.0
when I run

conda install pytorch==1.13.1  pytorch-cuda=11.8 -c pytorch -c nvidia

Torch ends up being installed without cuda support since torch.version.cuda is empty and
torch.zeros(1).cuda() gives


<stdin>:1: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /opt/conda/conda-bld/pytorch_1670525493953/work/torch/csrc/utils/tensor_numpy.cpp:77.)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.10/site-packages/torch/cuda/__init__.py", line 221, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

ptrblck · May 22, 2023, 4:34pm

The PyTorch 1.13.1 binaries were built with 11.6 and 11.7 as given here.
Note that the binaries ship with their own CUDA dependencies and your locally installed CUDA toolkit will be used if you build PyTorch from source or a custom CUDA extension.
In case you need to run torch==1.13.1 with CUDA 11.8 you would thus have to build it from source.

aditya28 · January 18, 2024, 9:38pm

hello, i am having a similar problem
I’m a complete beginner
I’m using tensorflow-gpu 2.10, CUDA 11.8, pytorch 1.13.1 and python 3.8.8
somehow tensorflow recognizes my CUDA & gpu but pytorch doesn’t
tf.test.is_gpu_available() & tf.test.is_built_with_cuda() both return True
but
torch.cuda.is_available() returns false

i am trying to install pytorch using
pip3 install torch==1.13.1+cu118 torchvision torchaudio
but gives errors
ERROR: Could not find a version that satisfies the requirement torch==1.13.1+cu118 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2)
ERROR: No matching distribution found for torch==1.13.1+cu118

I have reinstalled tensorflow-gpu and torch after CUDA installation

please help and thanks in advance