In TransformerDecoder, does tgt_key_padding_mask produce masked predictions too?

caffedude · July 8, 2024, 6:42pm

If I have padding for the input to a TransformerDecoder, does the padding also apply to the output automatically?

I want the decoder to predict things where the input might have been padding. Is this possible?