How to make sure which masking arguments I need to provide for calling torch.nn.Transformer model?

Cyber_punk · May 7, 2023, 5:57pm

The forward function of PyTorch’s Transformer implementation torch.nn.Transformer have a number of masking arguments that are all optional :

forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)

Parameters:

src (Tensor) – the sequence to the encoder (required).
tgt (Tensor) – the sequence to the decoder (required).
src_mask (Optional[Tensor]) – the additive mask for the src sequence
(optional).
tgt_mask (Optional[Tensor]) – the additive mask for the tgt sequence
(optional).
memory_mask (Optional[Tensor]) – the additive mask for the encoder
output (optional).
src_key_padding_mask (Optional[Tensor]) – the Tensor mask for src
keys per batch (optional).
tgt_key_padding_mask (Optional[Tensor]) – the Tensor mask for tgt
keys per batch (optional).
memory_key_padding_mask (Optional[Tensor]) – the Tensor mask for
memory keys per batch (optional).

If I just want to use a vanilla Transformer, what masking arguments should I provide? What are the effects if I do not provide them?

This is confusing to me because I found people selectively provide masking arguments.

For example, this one provided everything except memory_mask in TransformerDecoder layer. How can I know whether I should provide it or not?

Another example, this one uses a look-ahead mask (upper triangular) in the encoder part.

And in this example, the author didn’t input any mask.