Hi,
I want to get the gradient of attention map.
So, I tried to inject the hook function in torch.nn.functional._scaled_dot_product_attention module.
However, I cannot find any source code of _scaled_dot_product_attention in pytorch github.
Where can I find it?
Thank you, and Happy new year