All examples i find in google about self made attention, but i wonna use official one.
Note that this example uses only the encoder. It’ not the full transformer
Do i get it right and encoder return 2d array like batch_size*features. Can i get 3d array batch_size*position*features like embedding? I wonna try to feed it into simple rnn.
The nn.TransformerEncoder returns a 3d tensor with shape [S, N, E] where S is the length of your input sequence, N is batch size, E represents feature dimensions.