Specific implementation of the function scaled_dot_product_attention?

https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html#torch-nn-functional-scaled-dot-product-attention
The implementation code introduced in this document does not actually work the same as torch.nn.functional.scaled_dot_product_attention
I came to this conclusion through experimentation. My experimental method was to replace the code implementation described in the document with the place where this built-in function was called.