I am new at Transformer
My curiosity is this.
I will explain it with example.
I know there are two type of masks
subsquent mask and padding mask( of course there are memory mask in nn.Transformer but it was not in original Transformer so I ommitted)
memory is the output of encoder right?
so is it right that memory key padding mask(in TransformerDecoder) is same as src key padding mask (in nn.TransformerEncoder)
or keep it simple, when I want to use original Transfomer(in the paper), is it right that I don’t have to use any of memory mask and memory key padding mask in TransformerDecoder?