I wanted to know if Pytorch was using the V2 of flash attention here torch.nn.functional.scaled_dot_product_attention — PyTorch master documentation
It is not said in the description of the function, only V1 is mentioned (link above), however it seems to be the case according to the blog :
So is Flash Attention V2 implemented or not ?