Unable to Utilize GPUs with PyTorch on p2.xlarge EC2 Instances

Harsh_Mittal · April 25, 2024, 7:55am

Hi there!

I’m struggling to utlize the GPUs from p2.xlarge EC2 instance. I’ve installed nvidia driver & dkms as well.

Check out the snapshot where I’m getting a warning from Pytorch & the GPU system overview got from running ‘nvidia-smi’ command.

Here are some details.

EC2 Instance: p2.xlarge
AMI: Ubuntu
Torch Version: 2.X
Nvidia driver: 470.xxx
Cuda Version: 11.4

Looking forward to the solution
Thanks!

marksaroufim · April 26, 2024, 2:49am

So the problem is your NVIDIA driver is too old so you either need to upgrade your CUDA driver or downgrade your pytorch version. For the former you can follow guides like this CUDA Installation Guide for Linux

Or if you want to downgrade your torch version you can see what’s supported here https://download.pytorch.org/whl/torch/

And when you found the version you like can you pip install the wheel directly

Harsh_Mittal · April 26, 2024, 5:43am

Hi @marksaroufim,

After searching I got that Tesla K80 GPUs are being supported by CUDA toolkit 10.0 version & 10.0 CUDA toolkit is only available for Ubuntu 18, 16 & 14.

The available Ubuntu options that I have are 20, 22 & 24.

Seems like K80s are very old & old architectures (Ubuntu, CUDA toolkit) supports them.

Do you have any suggestions for me around how can I use K80? Or do I need to go for the some what new GPUs?

Also I’m thinking about degrading the Pytorch that’s compatible with the current CUDA version I’ve (11.4) - can you help me with finding the Pytorch version that will work with CUDA 11.4?

Would appreciate your help.
Thanks

ptrblck · April 26, 2024, 12:25pm

Newer CUDA toolkits, up to 11.8, also support Keppler architectures.

PyTorch binaries ship with their own CUDA runtime dependencies and your locally installed CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension.

Harsh_Mittal · April 26, 2024, 12:32pm

Hi @ptrblck, thank you for the response.

So I don’t have to install CUDA toolkit because Pytorch has it’s own CUDA implementation (correct me If I’m wrong).

You can see the snapshot @ptrblck - what actions do you suggest to solve this issue of ‘Old Nvidia Driver’.

I just want to setup the P2 instance in a way that the latest pytorch can utilise the GPUs - I just want to make it happen

Looking forward to your response.

Thank You

ptrblck · April 30, 2024, 3:24pm

If your setup has issues communicating with the GPU, you might indeed want to update the driver although I would assume the PyTorch + CUDA 11.8 binaries should just work with the older driver. Are you able to run any other CUDA application in this setup? Just to make sure it’s indeed working.