F.scaled_dot_product_attention - FlashAttention-1 or -2


I just want to know the exact backend of F.scaled_dot_product_attention.

I am performing some benchmarking and following this article - (Beta) Implementing High-Performance Transformers with Scaled Dot Product Attention (SDPA) — PyTorch Tutorials 2.3.0+cu121 documentation

Code says backend of F.scaled_dot_product_attention is FlashAttention-1 but documentation (torch.nn.functional.scaled_dot_product_attention — PyTorch 2.2 documentation) says it is FlashAttention-2.