scr_mask: is used to block specific positions from attention (feature tokens)
src_key_padding_mask : is used to block attending to PAD tokens.
However, I’m still not sure if I need to use it or not in TransformerEncoderLayer. The following are my guesses：
In the Language Model task: We need to generate the next word. The new word will be used to infer the feature word. we need to use src_mask
In the seq2seq task (Machine Translation): We need to generate a sequence. We do not need to use src_mask