I would like to know if a CNN can output le (x,y) point in an image that represents where the attention is on this image.
I’ve heard about attention mechanisms, but most of the example is for NLP.
So, maybe there is a way to output the « point of attention » through the last conv layer of a CNN as an (x,y) point.
I think this can be done by looking at what part of the image the CNN pay attention to classify an image. But my task is more complex. Here I want to do this without the classification task. I just want to know what is the point on this image we are looking at (something like visual perception), I mean, where the user is looking at!
It will be great if you can give me some resources. Example with PyTorch will be excellent.