Why F.scaled_dot_product_attention output in this case differs with normal attention

Juuso_Korhonen · May 22, 2024, 11:41am

What is that function reshape_batch_dim_to_heads?