nn.Transformer explaination

For your first part, it seems that you are not setting up attn_mask correctly.