Transformer masks explanation?

Vadim · November 20, 2020, 8:12pm

Can somebody please point me to a tutorial with a clear explanation of what each of the TransformerEncoder/Decoder mask parameters do, and when should one use them?

Specifically,

Which mask should I use for invalid tokens in TransformerEncoder input?
Same for invalid tokens in TransformerDecoder input?
Which mask should I use to deal with invalid “memory” entries I need to pass to TransformerDecoder?

So far I tried src_key_padding_mask, tgt_key_padding_mask and memory_key_padding_mask respectively, but I am getting output tensors consisting entirely of NaNs.

Thanks!

Abhilash_Srivastava · November 21, 2020, 1:43am

This is the one I usually refer to: https://pytorch.org/tutorials/beginner/transformer_tutorial.html