Residual connection with multi-head attetnion

Is there any simple example of implementing a residual connection with nn.MultiheadAttentiont .
I am asking because I wonder what happens with masked elements. For example if an element is masked in a multi-head attention layer then it comes back in the residual connection after and that element information is not blocked anymore.