How can i realize this attention

Hello, i’m trying to use linear self-attention for image, but if i apply it to image 128x128 it consume too much memory. Can i make patches (8, 8 for example) for query and key with conv layers compute attention and apply this attention to 128x128 value matrix so 1 element form attention apply to 8x8=64 element of value matrix