What's the difference between self attention and spatial attention?

At a glance, self-attention is proposed for representing contextual information of a context. for example, if we consider the sentence below: “I swam across the river until I reach the next bank” here the word bank in d dimensional space is too far than the river and the goal of self-attention is to bring them close together in terms of geometrically, while the goal of spatial attention in a visual task can consider like to focusing on the area of interest and ignoring other irrelevant parts something like an image cropping around the area of interest.

Is this description correct?

I found this for self-attention but I can’t find any implementation for spatial attention. is there a function for spatial attention in pytorch?