Frank1
(Frank)
February 23, 2024, 9:47am
1
pytorch version: 1.12.0+cu102
cuda version: release 11.6, V11.6.124
torch.cuda.get_device_name(0)
/usr/local/lib/python3.8/dist-packages/torch/cuda/init .py:146: UserWarning:
NVIDIA A100-SXM4-80GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-SXM4-80GB GPU with PyTorch, please check the instructions at Start Locally | PyTorch
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(1)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(2)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(3)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(4)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(5)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(6)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(7)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init .py”, line 329, in get_device_name
return get_device_properties(device).name
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init .py”, line 362, in get_device_properties
raise AssertionError("Invalid device i
ptrblck
February 23, 2024, 10:20pm
2
You’ve installed a PyTorch binary with CUDA 10.2, while your A100 Ampere GPU needs CUDA>=11.
Install any of the latest binaries and it’ll work.
Frank1
(Frank)
February 26, 2024, 6:18am
3
I update the PyTorch version had the same problem, giid:7 gpu loss
import torch
torch.cuda.device_count()
7
torch.cuda.get_device_name(0)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(1)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(2)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(3)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(4)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(5)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(6)
‘NVIDIA A100-SXM4-80GB’
torch.cuda.get_device_name(7)
Traceback (most recent call last):
File “”, line 1, in
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init .py”, line 329, in get_device_name
return get_device_properties(device).name
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init .py”, line 362, in get_device_properties
raise AssertionError(“Invalid device id”)
AssertionError: Invalid device id
torch.version
‘1.12.1+cu113’
ptrblck
February 26, 2024, 1:51pm
4
Could you describe what device7
is according to nvidia-smi
?
Frank1
(Frank)
February 27, 2024, 1:36am
5
nvidia-smi can display device 7
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM… On | 00000000:10:00.0 Off | 0 |
| N/A 39C P0 65W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM… On | 00000000:16:00.0 Off | 0 |
| N/A 36C P0 66W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM… On | 00000000:2F:00.0 Off | 0 |
| N/A 36C P0 63W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM… On | 00000000:33:00.0 Off | 0 |
| N/A 38C P0 69W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 4 NVIDIA A100-SXM… On | 00000000:C5:00.0 Off | 0 |
| N/A 37C P0 66W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 5 NVIDIA A100-SXM… On | 00000000:CA:00.0 Off | 0 |
| N/A 38C P0 65W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 6 NVIDIA A100-SXM… On | 00000000:E3:00.0 Off | 0 |
| N/A 36C P0 67W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 7 NVIDIA A100-SXM… On | 00000000:E7:00.0 Off | 0 |
| N/A 40C P0 67W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
ptrblck
February 27, 2024, 3:23am
6
Are you able to run any code on cuda:7
? If not, did you set CUDA_VISIBLE_DEVICES
in your environment?