SDPA backends supporting attn_mask

vadimkantorov · December 18, 2025, 8:48pm

I wonder, what is SDPA backend support matrix for custom attn_mask?

This FA issue suggests that it does not support attn_mask:

What backends support custom attn_mask?

Thanks!

drisspg · December 18, 2025, 9:07pm

On Cuda this is efficient attention and cudnn

vadimkantorov · December 23, 2025, 2:17am

Do any of these skip computing attention for empty mask blocks? Or do I need Flex for this?

If so, need to have Flex in-there too, as an opt-in backend