Unknown length outputs for RNNs

jackeown · June 8, 2020, 2:14am

For sequence to sequence models (for natural language translation for instance) you may want to have an LSTM or GRU output a sequence of unknown length. Here I’m not saying simply that the length varies between training examples, but also that the length of the output may not be known at test time. How does PyTorch deal with this? I assume that there’s maybe some maximum length output specified and padding or something, but how does this work with the loss function? I haven’t been able to find an example online of exactly this, otherwise I wouldn’t be asking here.

Thanks,
Jack

ptrblck · June 8, 2020, 8:21am

You would have to handle this use case and PyTorch doesn’t limit the approaches by assuming a max length or something like that.
E.g. if you are working with variable input shapes and would like to classify each sample in the temporal dimension, you could just use a target with the same (variable) sequence length.
On the other hand, you might want to classify the complete sequence to a specific class, so while the input would have a (variable) temporal dimension, your target should have a class index for each sample.

The processing highly depends on your use case.

jackeown · June 8, 2020, 5:17pm

E.g. if you are working with variable input shapes and would like to classify each sample in the temporal dimension, you could just use a target with the same (variable) sequence length.

I think I could do this be using the hidden state output of the LSTM with hidden.repeat(length), however it’s a little trickier with a batch of examples with different lengths, right? (I think I could still do it by repeating each hidden state separately and then using pack_sequence to pass these into the decoder)

On the other hand, you might want to classify the complete sequence to a specific class, so while the input would have a (variable) temporal dimension, your target should have a class index for each sample.

I understand this approach too. I think this is perhaps the simplest case.

What I’m wondering most about is things like language translation where the output is not length 1 and it is also not necessarily the length of the input. Do you know how you’d do that? Especially while still using batches?

Thanks,
Jack

ptrblck · June 9, 2020, 4:07am

The output will have the shape [seq_len, batch_size, num_directions*hidden_size], so you could use this instead of repeating the hidden tensor.

These seq2seq models could be used with batched data either by e.g. using the rnn utility functions such as pad_sequence etc. You could also try to create batched data with the same length (or cut some words), but this might not work well for your use case and depends how variable the lengths in your dataset are.

vdw · June 9, 2020, 4:29am

That’s a basic issue when it comes to generating sequences such as sentences and is not PyTorch specific. Given a standard RNN-based Seq2Seq model with an encoder and decoder, the decoder generate each next word step by step in a loop. The loop stops if

the next predicted word is an end-of-sequence token, e.g., <EOS>
the number of words is more then a max. threshold (but that’s usually as a safeguard)

This is why you need to add <EOS> as last word to your output sentences when training.

In each run trough the loop the decoder tries to predict the next word, which in turn gives you an error like for any prediction task. And you just sum up those errors over the whole loop.

Note that these simple descriptions assume a batch size of 1. For larger batch sizes, things get a bit more complicated. As @ptrblck mentioned, for training I use the workaround of generating batches with matching input and output lengths. Doing inferences, however, I do with batch sizes of 1.