`sm_89` not listed in the `torch.cuda.get_arch_list()`

We’re getting the docker image from here: nvcr.io/nvidia/pytorch:24.12-py3

and when we get the arch list:

import torch
torch.cuda.get_arch_list()

# ['sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90', 'compute_90']

torch.version.cuda

# '12.6'

sm_89 is not listed. We are running 4x NVIDIA RTX 6000 Ada cards.

We also tried installing the latest version of torch via pip, but still don’t see sm_89 listed. Do we have to do something special to enable it?

Here is the CUDA information if needed:

root@~ $ nvidia-smi --version
NVIDIA-SMI version  : 565.57.01
NVML version        : 565.57
DRIVER version      : 565.57.01
CUDA Version        : 12.7

Thanks!

No, you don’t need to build for sm_89as it’s binary compatible with sm_86/sm_80. Your device is thus supported in all of our builds.

Thanks @ptrblck so there shouldn’t be any issues running FP8 specific kernels in pytorch? Those aren’t part of sm_86…

More specifically I’m unable to utilize row-wise scaling in FP8

The response from the thread was that sm_89 should be listed in the arch_list.

Thanks!

I don’t know if the current row-wise scaling kernel implementation is compatible with sm_89 and we are explicitly building them for sm_90a (arch-conditional) here.

1 Like

There was a recent PR - Add SM89 support for f8f8bf16_rowwise() by alexsamardzic · Pull Request #144348 · pytorch/pytorch · GitHub

which “introduced support for _scaled_mm operator with FP8 inputs on SM89 architecture. The support is based on CUTLASS library, that is header-only C++ library, so this new functionality gets fully built along with PyTorch build; however, it will get built only in case the build includes SM89 among targets.”

Unfortunately this requires that sm_89 is on the list of targets.

I just opened a ticket Support `sm_89` in Stable/Nightly/Docker Images · Issue #145632 · pytorch/pytorch · GitHub

Does this make sense?

No, we should not add sm_89 directly as it will waste space with no benefits besides FP8 support. Instead, we should add sm_89 to the one file supporting FP8 as explained in my comment on GitHub.