How to make sure which masking arguments I need to provide for calling torch.nn.Transformer model?

The forward function of PyTorch’s Transformer implementation torch.nn.Transformer have a number of masking arguments that are all optional :

forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)


  • src (Tensor) – the sequence to the encoder (required).
  • tgt (Tensor) – the sequence to the decoder (required).
  • src_mask (Optional[Tensor]) – the additive mask for the src sequence
  • tgt_mask (Optional[Tensor]) – the additive mask for the tgt sequence
  • memory_mask (Optional[Tensor]) – the additive mask for the encoder
    output (optional).
  • src_key_padding_mask (Optional[Tensor]) – the Tensor mask for src
    keys per batch (optional).
  • tgt_key_padding_mask (Optional[Tensor]) – the Tensor mask for tgt
    keys per batch (optional).
  • memory_key_padding_mask (Optional[Tensor]) – the Tensor mask for
    memory keys per batch (optional).

If I just want to use a vanilla Transformer, what masking arguments should I provide? What are the effects if I do not provide them?

This is confusing to me because I found people selectively provide masking arguments.

For example, this one provided everything except memory_mask in TransformerDecoder layer. How can I know whether I should provide it or not?

Another example, this one uses a look-ahead mask (upper triangular) in the encoder part.

And in this example, the author didn’t input any mask.