If I have padding for the input to a TransformerDecoder, does the padding also apply to the output automatically?
I want the decoder to predict things where the input might have been padding. Is this possible?
If I have padding for the input to a TransformerDecoder, does the padding also apply to the output automatically?
I want the decoder to predict things where the input might have been padding. Is this possible?