Transformer-Encoder for a sentence encoding instead of LSTM

What could be used to replace LSTM sentence encoding with the Transformer model?

Currently I am using nn.LSTM that accepts packed sequence that is constructed from an input
(batch_size, max_seq_len, embed1) = (128, 20, 1024)

and it outputs
(1, batch_size, embed2) = (1, 128, 2048)

E.g we learned a single embedding from multiple embeddings max_seq_len.

How can I replace LSTM with Transformer to achieve the same results with exactly the same input?
E.g input is a tensor of size (128, 20, 1024) and the output is a tensor of size (1, 128, 2048).

Can I achieve this with nn.Tranformer or nn.TransformerEncoder?
So far I could not figure out how to achieve this with these classes.

Thank you