Is there a pytorch method for fetching the
shared memory capacity for the sm_89 series of chips (specifically the NVIDIA A6000 Ada cards)?
It’s my understanding that the sm_86 has a
shared memory capacity size of ~100k, which caps the head-dimension to 64 for flash attention. The A100s (sm_80) have 164kB, which allows a head-dimension of 128. I’d like to check this value for sm_89, so I could figure out the max head-dimension.
Is there another name for the
shared memory capacity, I don’t think it’s the same as the L1 cache. What should I look for in the Ampere/AdaLovelace/Hopper white papers?
reference: NVIDIA Ampere GPU Architecture Tuning Guide
One more thing: Is there a torch equivalent to this:
from numba.cuda.cudadrv import enums from numba import cuda device = cuda.get_current_device() attribs = [ name.replace("CU_DEVICE_ATTRIBUTE_", "") for name in dir(enums) if name.startswith("CU_DEVICE_ATTRIBUTE_") ] for attr in attribs: print(attr, "=", getattr(device, attr))
but it doesn’t return all the attributes. In the numba implementation, the device attributes are appended
lazily, hence the iteration through the