Conda pytorch-cuda=12 for arm64 and Grace Hopper

henrique · May 23, 2024, 11:16am

Is there any way, plan or timeline to support pytorch-cuda=12 for GH200?
Following Start Locally | PyTorch we just get PackagesNotFoundError
and pytorch-cuda=11 installs cuda 11.8 plus pytorch_cpu from conda_forge, which is not really usable

We can use the NGC containers or install it from source but conda/mamba is still the go to solution for many users…

Cheers!

ptrblck · May 23, 2024, 4:22pm

The binaries are already being build in CI runs, e.g. here (scroll down, download the corresponding Python version, and pip install the wheel locally), and the CI job integration allowing a pip install from the nightly index is WIP e.g. here (latest update from 5 mins ago, so we are actively working on it).

henrique · May 24, 2024, 10:00am

Thanks a lot for the info and quick reply, Piotr!
It works for me and it is great to know that it should soon be available in the official channels. Awesome work!

ps.: it seems the wheel has grown significantly in size; is that just a product of the dev build?

ptrblck · May 24, 2024, 1:32pm

The current wheel is shipping as a “large” wheel, which packages every dependency (including cuDNN, NCCL, cuBLAS, etc.) into the wheel’s lib folder (we’ve used the same workflow for nightly builds before). One of our next steps would be to use the CUDA PyPI dependencies (as is done for x86 Linux wheels) to build a “small” wheel again. In the end the same data will be downloaded but instead of a gigantic wheel, users would download different wheels instead.

dunlopj · September 5, 2024, 12:51pm

Hello ptrblck,

I know this topic is closed, but I didn’t want to create a new one. How exactly do you install pytorch 2.3.1 with Cuda 12.1 on an aarch64 system? I also have a new system with a grace hopper gpu. It appears that it’s still not yet available on conda. Only cuda 11.8 is available, however, it still downloads the cpu version of pytorch. Pip install does the same. I’m trying to install for Python 3.11. Any help would be greatly appreciated. Thanks.

ptrblck · September 5, 2024, 5:40pm

You would need to install the current nightly pip wheels with CUDA 12.4. torch==2.3.1+cu121 is not available for ARM+CUDA.

youkaichao1 · September 22, 2024, 7:57am

Hi, ptrblck, I checked the pytorch 2.4.1 release on pypi, and it does not have aarch64 + cuda support yet. see torch · PyPI .

when will aarch64 + cuda available for public release?

ptrblck · September 22, 2024, 6:11pm

I see the wheels were build e.g. 2.4.1+cu124 for Python 3.10 is located here: https://download.pytorch.org/whl/cu124/torch-2.4.1-cp310-cp310-linux_aarch64.whl#sha256=baa065a4fb7805c78f16841cfc4f3fc3c6823d1de726087e583c68abe553dad7

Did you forget to specify the index-url?

dunlopj · September 23, 2024, 2:06pm

Hi ptrblck, is there a guide to installing the pip wheels? Thanks.

PS: We were trying to set up the NGC containers, but no luck with those as yet.

ptrblck · September 23, 2024, 2:48pm

Select 2.4.1 with CUDA 12.4 and install torch:

pip install torch --index-url https://download.pytorch.org/whl/cu124
...
Collecting torch
  Downloading https://download.pytorch.org/whl/cu124/torch-2.4.1-cp310-cp310-linux_aarch64.whl (2355.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 GB 43.2 MB/s eta 0:00:00
...
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.randn(1).cuda())"
2.4.1
12.4
tensor([-1.6937], device='cuda:0')

vision and audio are available as nightlies only for now.

dunlopj · September 23, 2024, 4:23pm

Thank you. I’ll give it a try.

dunlopj · September 23, 2024, 4:34pm

Ok this is only the CPU version. I mistakenly thought it was the cuda version. Pytorch is not recognizing the GPU.

ptrblck · September 24, 2024, 12:28am

No, it’s the PyTorch binary with CUDA support as seen in my example executed on a Grace Hopper node.

dunlopj · September 24, 2024, 1:08am

Ok. I miss that part of your previous message. I’ll try again, because I did install, but when I did the torch.cuda.is_available(), I got False.

dunlopj · September 24, 2024, 1:40am

I got through. I created a fresh environment and installed python 3.10. I had 3.11 installed and it was downloading torch 2.0.1. Thanks a lot. I really appreciate it.

Screenshot from 2024-09-23 22-00-52

ptrblck · September 24, 2024, 2:37am

Great, thanks for verifying! That’s still a bit weird as the Python==3.11 wheels are also available at the specified URL. In any case, good to hear it’s working.

dunlopj · September 24, 2024, 4:17am

Got it installed, but my code is not recognizing the GPU. It runs on the CPU.

ptrblck · September 24, 2024, 12:22pm

Your code snippet shows you are able to create a tensor on the GPU so your GPU is recognized and usable.

dunlopj · September 24, 2024, 1:11pm

But the actual code I’m trying to run doesn’t pick up the cuda device. Below is a snippet of the code to acquire the device.

if self.args.use_gpu:
device = torch.device(‘cuda’)
print(f’Use GPU: cuda: {torch.cuda.current_device()}')
else:
device = torch.device(‘cpu’)
print(‘Use CPU’)

ptrblck · September 24, 2024, 4:59pm

Check if self.args.use_gpu is set to True and if not, set it.
We’ve now verified that ARM + CUDA binaries are available and are shipping since PyTorch 2.4.0 (as well as the nightly binaries). It seems you might have more questions related to your actual script, so I would recommend starting a new thread to keep this one for the actual binary support.