Imagine I want to feed an image to a couple of stacked resnet blocks. How should I stack the resnet blocks in code and how can I attach self-attention modules to them? Can you please suggest to me some related code that might look like this architecture?
These are natural images.
I am open to using either resnet18 or resnet50.
Also at the very end, how does the final embedding combine all of these?
@ptrblck I think i want to do something like this except I am unsure how to add self-attention to intermediate resnet blocks
also, for self-attention, there are so many options. Is there any self-built module within pytorch that you would suggest?
I found these alternatives, what is your take?