When to use src_mask in nn.TransformerEncoderLayer

yipliu · July 25, 2022, 2:29pm

TransformerEncoderLayer

scr_mask: is used to block specific positions from attention (feature tokens)

src_key_padding_mask : is used to block attending to PAD tokens.

However, I’m still not sure if I need to use it or not in TransformerEncoderLayer. The following are my guesses：

In the Language Model task: We need to generate the next word. The new word will be used to infer the feature word. we need to use src_mask
In the seq2seq task (Machine Translation): We need to generate a sequence. We do not need to use src_mask