Stacking a couple of resnet blocks each with a self-attention module

@ptrblck I think i want to do something like this except I am unsure how to add self-attention to intermediate resnet blocks

also, for self-attention, there are so many options. Is there any self-built module within pytorch that you would suggest?
I found these alternatives, what is your take?

  1. Attention in image classification - #3 by AdilZouitine
  2. GitHub - Chenglin-Yang/LESA: Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms
  3. GitHub - leaderj1001/Stand-Alone-Self-Attention: Implementing Stand-Alone Self-Attention in Vision Models using Pytorch