SDPA backends supporting attn_mask

I wonder, what is SDPA backend support matrix for custom attn_mask?

This FA issue suggests that it does not support attn_mask:

What backends support custom attn_mask?

Thanks!

On Cuda this is efficient attention and cudnn

Do any of these skip computing attention for empty mask blocks? Or do I need Flex for this?

If so, need to have Flex in-there too, as an opt-in backend :slight_smile: