PyTorch install doesn't support cuda 12.6 sm_75

michibaum · February 21, 2025, 11:23am

I am running an official docker (holoscan) container on the nvidia clara with ubuntu 20.04LTS.

I installed PyTorch with: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
Then I tried running my code which returns

/home/holoscan/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:235: UserWarning: 
Quadro RTX 6000 with CUDA capability sm_75 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_80 sm_86 sm_89 sm_90 sm_90a.
If you want to use the Quadro RTX 6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

nvidia-smi
Fri Feb 21 10:01:21 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 6000                Off | 00000000:09:00.0  On |                  Off |
| 33%   34C    P5              38W / 260W |    795MiB / 24576MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:44:37_PDT_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.get_arch_list())
['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a']

Has the sm_75 support been removed?

I tried the same with an earlier version: pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124, which returns the same arguments:
```
print(torch.cuda.get_arch_list())
['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a']
```

I also tried another version: pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118, which gives me this error when trying to run my code:

[error] [gxf_wrapper.cpp:100] Exception occurred for operator: 'optimizeData' - AssertionError: Torch not compiled with CUDA enabled

At:
  /home/holoscan/.local/lib/python3.10/site-packages/torch/cuda/__init__.py(239): _lazy_init
  /workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/optimisation_helicoid.py(395): optimize_single_b_torch_np
  /workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/optimisation_helicoid.py(431): helicoid_optimisation_ti_parallel_torch_np
  /workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/biopsy_application.py(362): compute

[error] [entity_executor.cpp:596] Failed to tick codelet optimizeData in entity: optimizeData code: GXF_FAILURE
[warning] [greedy_scheduler.cpp:243] Error while executing entity 53 named 'optimizeData': GXF_FAILURE
[info] [greedy_scheduler.cpp:401] Scheduler finished.
[error] [program.cpp:580] wait failed. Deactivating...
[error] [runtime.cpp:1649] Graph wait failed with error: GXF_FAILURE
[warning] [gxf_executor.cpp:2241] GXF call GxfGraphWait(context) in line 2241 of file /workspace/holoscan-sdk/src/core/executors/gxf/gxf_executor.cpp failed with 'GXF_FAILURE' (1)
[info] [gxf_executor.cpp:2251] Graph execution finished.
[error] [gxf_executor.cpp:2259] Graph execution error: GXF_FAILURE
Traceback (most recent call last):
  File "/workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/biopsy_application.py", line 527, in <module>
    main()
  File "/workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/biopsy_application.py", line 520, in main
    app.run()
  File "/workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/biopsy_application.py", line 362, in compute
    errors, coef_list, scattering_params, errors_scatter = helicoid_optimisation_ti_parallel_torch_np(t1, b, M, x)
  File "/workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/optimisation_helicoid.py", line 431, in helicoid_optimisation_ti_parallel_torch_np
    results = optimize_single_b_torch_np(range(b.shape[0]), b, [b_t1]*b.shape[0], [a_t1]*b.shape[0], [M]*b.shape[0], [x]*b.shape[0], [current_x]*b.shape[0], [left_bound]*b.shape[0], [right_bound]*b.shape[0])
  File "/workspace/volumes/data/ba_hyperspectral_segmentation/move_to_holohub/holohub/applications/biopsy_app_cupy/optimisation_helicoid.py", line 395, in optimize_single_b_torch_np
    b_i = torch.as_tensor(b_i, device='cuda')
  File "/home/holoscan/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

torch.cuda.get_arch_list() returns an empty list:

>>> import torch
print(torch.>>> print(torch.cuda.get_arch_list())
[]

Which PyTorch version do I have to use, and if it is an older one doesn’t this reduce performance?

This might be related to PyTorch 1.6 - "Tesla T4 with CUDA capability sm_75 is not compatible" but the used versions are a lot older than the ones I want to use.

ptrblck · February 21, 2025, 1:54pm

PyTorch binaries for ARM with CUDA dependencies support Hopper and Blackwell architectures. If you have valid use cases to add more architectures please create a feature request or build from source in the meantime.

broomhead · February 27, 2025, 4:20pm

Glad to see this. So I’m trying to run pytorch on AWS G5G spot instances; they use amazon linux (ARM) and have T4 GPUs. I’m hitting the same issue. I could file a feature request. For now, I will look at building from source. Thanks.