I have a baseline seq2seq model with LSTM encoder and decoder. The encoder forwards the input sequence and the final hidden and cell state is presumed to contain the summary of the entire sequence. This summary is then fed as the initial hidden and cell state to the LSTM decoder which then generates the output sequence token by token.
I have batched sequences of different lengths by padding at the end (using torch.utils.rnn.pack_padded_sequence
). Consider the case where a sequence has padding tokens towards the end. If we pass this sequence through the LSTM encoder, the final hidden/cell state retrieved will not be the real one as this state is reached after processing the padding tokens. Ideally, we would want the hidden state right after the last non-padding token was processed. How can this ideal situation be realized?