Does my approach make sense? CNN-LSTM

I see and should I just have it after the feature extraction layer?
Also, I found this as well Attention in image classification - #3 by AdilZouitine