@ptrblck, can I get help with a similar issue? I have a K80 GPU machine, nvidia driver 470.82.01, cuda version 11.8, pytorch version ‘2.0.1+cu117’. If I run torch.cuda.is_available()
it returns True
, and torch.cuda.device_count()
returns 1.
However, running torch.zeros(1, device="cuda")
returns RuntimeError: No CUDA GPUs are available
, similar as torch.cuda.get_device_name()
; !python -m torch.utils.collect_env
also fails with same error.
Pytorch training with GPU works if I install cuda 10.2 on the machine. However, the K80 should be compatible with compute capability 3.7, the installed driver and cuda 11.8). Running torch.cuda.get_arch_list()
responds ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
, supporting 3.7, correct?
Why wouldn’t it work? Any way to enable a more updated driver? Thank you!
!nvidia-smi
shows
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000003:00:00.0 Off | 0 |
| N/A 32C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+