Pack_padded_sequence and pad_packed_sequence with variable length input that starts at different time steps

sf23 · March 29, 2020, 1:16am

Hi, based on my understanding of the pack_padded_sequence and pad_packed_sequence methods for RNNs in PyTorch, it seems like sequences of variable lengths are expected to be padded at the end.

For example (where the padding value is 0):

[1, 2, 3, 4, 0]
[1, 2, 0, 0, 0]
[1, 0, 0, 0, 0]

However, I am wondering if there is any way I can workaround this / if these methods can handle sequences of variable length that start at varying “time steps”. For example if I have sequences (expressed with padding_value = 0):

[0, 0, 1, 2, 3]
[0, 1, 2, 3, 0]
[1, 0, 0, 0, 0]

Is there any way I can handle that in PyTorch / with pack_padded_sequence and pad_packed_sequence.

Thanks in advance! And apologies if the answer is somehow obvious and I’m missing it.

vdw · March 29, 2020, 11:45am

As I far as I know, I don’t think that’s possible. Not 100% sure, though.

However, what would be the use case? Or w.r.t. your example, what’s the difference between [0, 0, 1, 2, 3] and [0, 1, 2, 3, 0]? Both are effectively the same sequences. So you could convert your inputs to:

[1, 2, 3]
[1, 2, 3]
[1, 0, 0]

sf23 · March 29, 2020, 8:10pm

The idea I want to capture is that these different sequences occur at different points in a lifetime. So for an example if I have data that represents info on a given person through the span of their lifetime, but one sequence starts at age 50 while another starts at age 35, I’d like to be able to capture these different ‘start times’ by varying their position within a fixed length sequence.

I suspected as much with pack_padded_sequence / pad_packed_sequence but any other ideas for how to represent these different start times are also much appreciated.

ebarsoum · March 29, 2020, 9:23pm

Why this differences aren’t in the data itself?

vdw · March 30, 2020, 1:42am

Then I would argue that it’s not an issue of padding. Padding is used to “fill up” sequences to enable efficient batch processing. Ideally, a network learns that padding (e.g., index 0) means nothing and can/should be ignored.

For you, there is a semantic meaning that a sequences hasn’t started yet. So your batch would look like:

[4, 4, 1, 2, 3]
[4, 1, 2, 3, 0]
[1, 0, 0, 0, 0]

Where index 4 represents “nothing happened yet” or something.

I’m not really sure if this makes semantically sense, but I would still argue that it’s not the same as padding.

sf23 · March 30, 2020, 6:34pm

Fair enough—this is something I’m thinking about as well. Thank you!

sf23 · March 30, 2020, 6:35pm

Ah, I see what you’re saying. Using a different token to represent “unstated” sequences could be a good idea, I’ll look into this as well. Thanks for your help!

vdw · March 30, 2020, 11:53pm

By the way, this is less a PyTorch question but more a conceptual one. No harm asking here, of course, but you might also want to try machine learning forums with focus on theory (and less on a specific framework).

abediee · July 18, 2020, 5:47am

Did you find a solution to this problem?