When to use src_mask in nn.TransformerEncoderLayer

TransformerEncoderLayer

scr_mask: is used to block specific positions from attention (feature tokens)

src_key_padding_mask : is used to block attending to PAD tokens.

However, I’m still not sure if I need to use it or not in TransformerEncoderLayer. The following are my guesses:

  1. In the Language Model task: We need to generate the next word. The new word will be used to infer the feature word. we need to use src_mask

  2. In the seq2seq task (Machine Translation): We need to generate a sequence. We do not need to use src_mask