In TransformerDecoder, does tgt_key_padding_mask produce masked predictions too?

If I have padding for the input to a TransformerDecoder, does the padding also apply to the output automatically?

I want the decoder to predict things where the input might have been padding. Is this possible?