Why F.scaled_dot_product_attention output in this case differs with normal attention

What is that function reshape_batch_dim_to_heads?