Trying to setup a service, need some quick help with pytorch/cuda compatibility

Trying to setup a python service. Hosting it on AWS with a g4dn.xlarge GPU using Amazon Linux 2023.

Driver: CUDA 12.7
Toolkit: CUDA Toolkit 12.6 Update 2
Pytorch: 2.5.1+cu124

I’ve been spinning on this trying different things. Do I just need to downgrade my toolkit or pytorch version or something obvious? Wanted to seek some clarification before I keep going round and round.

±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 31C P0 26W / 70W | 1MiB / 15360MiB | 10% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

Thanks for any help.

It’s unclear what the issue is. Your setup looks valid and the locally installed CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension.

The issue was that it wasn’t finding a CUDA gpu. I ended up changing the instance to Ubuntu and it fired right up. I don’t really know what happened. But it works, so whatever, haha.

1 Like

Just kidding. It was working because of a backup I built in, and it’s only using the CPU. So yeah, back to where I was at. Sigh.

I was finally able to get it working properly btw. Downgraded the drivers to 12.1 and that did the trick.