I am trying to understand how spatial reduction attention works but I could not find any source. You can see the formulas below
and related paper: https://arxiv.org/pdf/2102.12122.pdf
Can someone describe it, an example would be great.
I am trying to understand how spatial reduction attention works but I could not find any source. You can see the formulas below
and related paper: https://arxiv.org/pdf/2102.12122.pdf
Can someone describe it, an example would be great.
I think it works like Linformer, you can see it here: Linformer: Self-Attention with Linear Complexity (Paper Explained) - YouTube