Conda pytorch-cuda=12 for arm64 and Grace Hopper

Is there any way, plan or timeline to support pytorch-cuda=12 for GH200?
Following Start Locally | PyTorch we just get PackagesNotFoundError
and pytorch-cuda=11 installs cuda 11.8 plus pytorch_cpu from conda_forge, which is not really usable :slight_smile:

We can use the NGC containers or install it from source but conda/mamba is still the go to solution for many users…

Cheers!

The binaries are already being build in CI runs, e.g. here (scroll down, download the corresponding Python version, and pip install the wheel locally), and the CI job integration allowing a pip install from the nightly index is WIP e.g. here (latest update from 5 mins ago, so we are actively working on it).

1 Like

Thanks a lot for the info and quick reply, Piotr!
It works for me and it is great to know that it should soon be available in the official channels. Awesome work!

ps.: it seems the wheel has grown significantly in size; is that just a product of the dev build?

The current wheel is shipping as a “large” wheel, which packages every dependency (including cuDNN, NCCL, cuBLAS, etc.) into the wheel’s lib folder (we’ve used the same workflow for nightly builds before). One of our next steps would be to use the CUDA PyPI dependencies (as is done for x86 Linux wheels) to build a “small” wheel again. In the end the same data will be downloaded but instead of a gigantic wheel, users would download different wheels instead.

1 Like

Hello ptrblck,

I know this topic is closed, but I didn’t want to create a new one. How exactly do you install pytorch 2.3.1 with Cuda 12.1 on an aarch64 system? I also have a new system with a grace hopper gpu. It appears that it’s still not yet available on conda. Only cuda 11.8 is available, however, it still downloads the cpu version of pytorch. Pip install does the same. I’m trying to install for Python 3.11. Any help would be greatly appreciated. Thanks.

You would need to install the current nightly pip wheels with CUDA 12.4. torch==2.3.1+cu121 is not available for ARM+CUDA.

Hi, ptrblck, I checked the pytorch 2.4.1 release on pypi, and it does not have aarch64 + cuda support yet. see torch · PyPI .

when will aarch64 + cuda available for public release?

I see the wheels were build e.g. 2.4.1+cu124 for Python 3.10 is located here: https://download.pytorch.org/whl/cu124/torch-2.4.1-cp310-cp310-linux_aarch64.whl#sha256=baa065a4fb7805c78f16841cfc4f3fc3c6823d1de726087e583c68abe553dad7

Did you forget to specify the index-url?

Hi ptrblck, is there a guide to installing the pip wheels? Thanks.

PS: We were trying to set up the NGC containers, but no luck with those as yet.

Select 2.4.1 with CUDA 12.4 and install torch:

pip install torch --index-url https://download.pytorch.org/whl/cu124
...
Collecting torch
  Downloading https://download.pytorch.org/whl/cu124/torch-2.4.1-cp310-cp310-linux_aarch64.whl (2355.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 GB 43.2 MB/s eta 0:00:00
...
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.randn(1).cuda())"
2.4.1
12.4
tensor([-1.6937], device='cuda:0')

vision and audio are available as nightlies only for now.

1 Like

Thank you. I’ll give it a try.

Ok this is only the CPU version. I mistakenly thought it was the cuda version. Pytorch is not recognizing the GPU.

No, it’s the PyTorch binary with CUDA support as seen in my example executed on a Grace Hopper node.

Ok. I miss that part of your previous message. I’ll try again, because I did install, but when I did the torch.cuda.is_available(), I got False.

I got through. I created a fresh environment and installed python 3.10. I had 3.11 installed and it was downloading torch 2.0.1. Thanks a lot. I really appreciate it.

Screenshot from 2024-09-23 22-00-52

Great, thanks for verifying! That’s still a bit weird as the Python==3.11 wheels are also available at the specified URL. In any case, good to hear it’s working.

Got it installed, but my code is not recognizing the GPU. It runs on the CPU.

Your code snippet shows you are able to create a tensor on the GPU so your GPU is recognized and usable.

But the actual code I’m trying to run doesn’t pick up the cuda device. Below is a snippet of the code to acquire the device.

if self.args.use_gpu:
device = torch.device(‘cuda’)
print(f’Use GPU: cuda: {torch.cuda.current_device()}')
else:
device = torch.device(‘cpu’)
print(‘Use CPU’)

Check if self.args.use_gpu is set to True and if not, set it.
We’ve now verified that ARM + CUDA binaries are available and are shipping since PyTorch 2.4.0 (as well as the nightly binaries). It seems you might have more questions related to your actual script, so I would recommend starting a new thread to keep this one for the actual binary support.