What could be used to replace LSTM sentence encoding with the Transformer model?
Currently I am using nn.LSTM that accepts packed sequence that is constructed from an input
(batch_size, max_seq_len, embed1) = (128, 20, 1024)
and it outputs
(1, batch_size, embed2) = (1, 128, 2048)
E.g we learned a single embedding from multiple embeddings max_seq_len.
How can I replace LSTM with Transformer to achieve the same results with exactly the same input?
E.g input is a tensor of size (128, 20, 1024) and the output is a tensor of size (1, 128, 2048).
Can I achieve this with nn.Tranformer or nn.TransformerEncoder?
So far I could not figure out how to achieve this with these classes.
Thank you