How to use pack_padded_sequence in seq2seq models

Hi!
I want to build a conversational model based on seq2seq.
For using pack_padded_sequence we need to sort sentences by length.
In my case, I have 2 txt files(encoder, decoder) where each line corresponds one sentence, so, for example, the first question in encoder file correspond the first answer in the decoder.

q: [5] | a: [1,2,3,4]
q: [6,7,8] | a: [1,2]

Then when I sort sequences by length I get:

q: [6,7,8] | a: [1,2,3,4]
q: [5] | a: [1,2]

So, h(hidden state) and c(cell state) from encoder will be incorrect for corresponding sentences in the decoder in the batch. How can I solve it? Or I shouldn’t use pack_padded_sequence?

Thanks!

2 Likes

You should keep track of the order somehow. It’s pretty common for seq2seq models to use some kind of attentional input feeding in the decoder which prevents nn.LSTM from being used for all decoder timesteps in one call; in that case I sort the batch based only on source sentence length and use an unrolled LSTM cell in the decoder. Code that sorts the batch together according to the source sentence length will be in torchtext shortly.

1 Like

@jekbradbury If I understand you correctly, I need to unfold decoder step by step, if yes, does it mean that I need to use a mask for the loss? Because anyway in some sentences in the decoder will be paddings
I’m sorry, it is my first seq2seq

Yes, that’s true. The OpenNMT implementation https://github.com/OpenNMT/OpenNMT-py is an example that does this, although it’s a little complicated.

1 Like

Hello @jekbradbury This is regarding the loss, I have tried to follow your advice (How can i compute seq2seq loss using mask?). To work out something on the decoder side where I find the cross entropy loss and with that i multiply the decoder mask. But now I am not able to figure out how can I apply the same rule to the encoder outputs as the encoder outputs would be (seq_len,batch_size,hidden_size) and the encoder mask (seq_len,batch_size).

After attempting to implement seq2seq with packed_sequences, I have come to the conclusion that this is not really feasable as,

  1. seq2seq requires last hidden/cell state for each element in batch. I believe there is no API call to do this when using packed, sequences.

  2. The way packed sequences work at the moment, is that they require inputs to be sorted by length. As OP points out, this requires a mirrored re-ordering of the target inputs.

It appears to me that fundamentally, packed sequences require a lot of fenangling to work as currently implemented. It feels like they could benefit from a little more abstraction to make things a bit easier for the programmer.

Happy to be corrected/proven wrong here!