I wonder, what is SDPA backend support matrix for custom attn_mask?
This FA issue suggests that it does not support attn_mask:
What backends support custom attn_mask?
Thanks!
I wonder, what is SDPA backend support matrix for custom attn_mask?
This FA issue suggests that it does not support attn_mask:
What backends support custom attn_mask?
Thanks!
On Cuda this is efficient attention and cudnn
Do any of these skip computing attention for empty mask blocks? Or do I need Flex for this?
If so, need to have Flex in-there too, as an opt-in backend ![]()