How to train attention network?

I am trying to add a spatial attention mechanism to my semantic segmentation network (e.g., U-net), however, I have two questions?

  1. I want to add attention to multiple layers, should I design one network and just resample the output wrt the dimensions of that specific layer? or should I design different attention networks for each layer?
  2. should I train attention network(s) at the same time I train the main network? or should they be trained sequebtially?

Thank you so much