Hi, I am trying to build a variational autoencoder with pytorch and would like to use a transformer for the encoder and the decoder. But I’m not sure how to do that.
I have implemented the tokenization of my dataset, meaning that each word of a sentence is translated into a number.
The input of my encoder would be a tensor of shape (batch_size, sent_length) containing integers between 0 and the number of different words I have in my dataset.
How can I build a transformer that creates from this input a latent representation of shape (batch_size, latent_dim)? Is there any tutorial for this use-case?