I am trying to learn how to implement seq2seq model in Pytorch.
I have 2 questions:
How to encode sentences? Suppose, I have a sentence “Lionel plays football.”. How to represent this sentence as tensor? Will this sentence be represented as a vector of length vocab_size, with 0 and 1 entries? Where does seq_length come in lstm input?
I am having trouble in the decoding part of the seq2seq model. How to pass both the encoder’s hidden state as well the target ouput in the seq2seq decoder? Once, I define a LSTM as my decoder, how do I get a sequence as output?
I think the link to the tutorial is spot on, many of your questions are addressed in the tutorial.
No, as shown in the tutorial, you usually go to dense embeddings directly, combining the hypothetical one-hot-encoding with a linear layer.
Generally, you pass in input of shape (seq_len, batch_size, embedding_dim). How to deal with varying input lengths within a batch is (from the top of my head) not in the tutorial. You have the options of packing the batch of sequences (torch.nn.utils.pack_sequence and friends) or padding it.
This is shown in the picture right at the top of the tutorial: You pass in the final hidden state of the encoder as the hidden state of the decoder and a ‘’ token.
Then you either feed each output of the decoder as the next timestep’s input (“usual operation”) or use the corresponding target in place of the output (“teacher forceing”), there is something about that in the tutorial.
The actual output of the decoder should be a score similar to that at the end of a classification, with the output vocabulary’s words as classes. You then compare to the target output, e.g. using one of the NLLLoss like in the tutorial.