Attention in image classification

Hi,

I’m using a resnet to extract features and the last layer gives me an output of (batchsize x 2048 x 1 x 1).

When i passed this onto this code SelfAttention i get a the same dimensions out. Is this the correct behaviour?

Can i use this vector to flatten and pass it to fully connected layer in order to classify my own classes?

Thank you.