SDPA backend routes requirement

Sohaib_Ahmed · August 6, 2024, 9:53pm

What are the input requirements that tells SDPA to use memory-efficient, flash-attention or math backend. I saw some repo to use is_causal=True instead of attetnion_mask in order to dispatch to flash-attention backend but still where are these requirements documented?
Also Is SDPA flash-attention implementation is same as original GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention