How to get the `shared memory capacity` for sm_89?

Hey there,

Is there a pytorch method for fetching the shared memory capacity for the sm_89 series of chips (specifically the NVIDIA A6000 Ada cards)?

It’s my understanding that the sm_86 has a shared memory capacity size of ~100k, which caps the head-dimension to 64 for flash attention. The A100s (sm_80) have 164kB, which allows a head-dimension of 128. I’d like to check this value for sm_89, so I could figure out the max head-dimension.

Is there another name for the shared memory capacity, I don’t think it’s the same as the L1 cache. What should I look for in the Ampere/AdaLovelace/Hopper white papers?

reference: NVIDIA Ampere GPU Architecture Tuning Guide

One more thing: Is there a torch equivalent to this:

    from numba.cuda.cudadrv import enums
    from numba import cuda

    device = cuda.get_current_device()
    attribs = [
        name.replace("CU_DEVICE_ATTRIBUTE_", "")
        for name in dir(enums)
        if name.startswith("CU_DEVICE_ATTRIBUTE_")
    for attr in attribs:
        print(attr, "=", getattr(device, attr))

I tried:


but it doesn’t return all the attributes. In the numba implementation, the device attributes are appended lazily, hence the iteration through the enum.

Thank you!

sm89 should have the same restriction as sm86, barring future improvements to flash attention implementations. You can find the documented shared memory capacity in the Ada tuning guide similar to the one you linked for Ampere here: NVIDIA Ada GPU Architecture Tuning Guide

This limitation is enforced in upstream PyTorch’s flash attention here: pytorch/sdp_utils.cpp at 8994d9e6109c541a1d581c383e4de9ed68205d91 · pytorch/pytorch · GitHub

Thanks @eqy - but then how are they able to support a head_dim of 96 here:

I believe that would qualify as an improvement to the implementation rather than relying on the current kernels.

1 Like