Can somebody please point me to a tutorial with a clear explanation of what each of the TransformerEncoder/Decoder mask parameters do, and when should one use them?
Specifically,
- Which mask should I use for invalid tokens in TransformerEncoder input?
- Same for invalid tokens in TransformerDecoder input?
- Which mask should I use to deal with invalid “memory” entries I need to pass to TransformerDecoder?
So far I tried src_key_padding_mask
, tgt_key_padding_mask
and memory_key_padding_mask
respectively, but I am getting output tensors consisting entirely of NaNs.
Thanks!