Simple working example how to use packing for variable-length sequence inputs for rnn

AjayTalati · April 22, 2017, 2:18am

Hi,

Updated - here’s a simple example of how I think you use pack_padded_sequence and pad_packed_sequence, but I don’t know if it’s the right way to use them?

import torch
import torch.nn as nn
from torch.autograd import Variable

batch_size = 3
max_length = 3
hidden_size = 2
n_layers =1

# container
batch_in = torch.zeros((batch_size, 1, max_length))

#data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])

batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3

batch_in = Variable(batch_in)

seq_lengths = [3,2,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)

>>> pack
PackedSequence(data=Variable containing:
 1  2  3
 1  2  0
 1  0  0
[torch.FloatTensor of size 3x3]
, batch_sizes=[3])


# initialize
rnn = nn.RNN(max_length, hidden_size, n_layers, batch_first=True) 
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward 
out, _ = rnn(pack, h0)

# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out)

>>> unpacked
Variable containing:
(0 ,.,.) = 
 -0.7883 -0.7972
  0.3367 -0.6102
  0.1502 -0.4654
[torch.FloatTensor of size 1x3x2]

James @jekbradbury, Adam @apaszke, does this look right to you guys? Can you help with clarifying the docs?

Be nice to have more simple working examples in the docs? That was one of the real joys of using torch, it was almost as easy to use as numpy

Thanks a lot for your help

KeithYin · April 22, 2017, 3:36am

the input of the RNN should be a Variable of shape [seq, batch, feature] or [batch, seq, feature] if batch_first=True according to the docs. the input in your code is [batch, feature, seq].

AjayTalati · April 22, 2017, 3:51am

Hi Keith @KeithYin,

thanks a lot for spotting that! I’ve updated the example

Seems like the tedious thing with all this padding stuff is sorting the input vectors by length, before they go into the rnn, and then mapping the output back to the original indexing?

I’ll keep adding to the example

Cheers,

Aj

itzjustricky · May 30, 2017, 8:17pm

When I run the simple example that you have provided, the content of unpacked_len is [1, 1, 1] and the unpacked variable is as shown above.

I expected unpacked_len as [3, 2, 1] and for unpacked to be of size [3x3x2] (with some zero padding) since normally the output will contain the hidden state for each layer as stated in the docs.

output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_k) from the last layer of the RNN, for each k.

Is PackedSequence only designed to help the programmer retrieve the last output (i.e. h_n tensor containing the hidden state for k=seq_len)?

smth · May 30, 2017, 8:23pm

nn.LSTM, nn.RNN etc. all of these only give the last hidden state. Intermediate hidden states are not given or retrievable.

itzjustricky · May 30, 2017, 8:33pm

I see. Thank you for clarifying my misunderstanding!

Gopal_Sharma · June 21, 2017, 2:19pm

How exactly variable length sequences are handled? How do you make sure that there is no backprop from sequences whose lengths are smaller than sequence of largest lengths?
Is it the case that, since output of the RNN for smaller sequence length becomes zero after their original lengths, gradients also become zero?

matthew_zeng · July 28, 2017, 3:59pm

batch_in size is still incorrect.

It should be [seq, batch, feature_size] if batch_first=True while batch_in is [seq, feature, batch] in your example.

nitish · September 27, 2017, 5:59am

I have a followup question. After 1. sorting the input, 2. packing, 3. passing through lstm, 4. padding, how do you recover the original order back, i.e. the order before sorting.

Houjing_Huang · September 29, 2017, 1:38pm

It seems that the sorting procedure is done manually before hand, thus you have to record the order at the time if you need to use it at a later time.

Houjing_Huang · September 29, 2017, 1:42pm

Is it true that in your words ‘hidden state’ means the cell state? That is, it refers to the c in the following equations?

Thank you.

Jonbean · November 6, 2017, 4:43pm

I really don’t see the purpose of using torch.nn.utils.rnn.pack_padded_sequence. If sorting, length calculation has already been handled outside using numpy already, then we can just feed the input as a tensor directly. What is the benefit of wrapping it?

yifanwang · December 9, 2017, 11:02pm

Agree.
The reason that the code can run without error is that batch_size is set to be equal to max_length. It won’t work if you change either of them. And the first parameter of nn.RNN should be input_size rather than the maximal sequence length. Besides, I would prefer writing vec_1 = torch.FloatTensor([[1], [2], [3]]) than vec_1 = torch.FloatTensor([[1, 2, 3]]) but here both are fine.

Here is a modified version:

batch_size = 4
max_length = 3
hidden_size = 2
n_layers =1
feature_dim = 1

# container
batch_in = torch.zeros((batch_size, max_length, feature_dim))

# data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
vec_4 = torch.FloatTensor([[2, 0, 0]])

batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in[3] = vec_4

batch_in = Variable(batch_in)
print(batch_in.size())
seq_lengths = [3,2,1,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
print(pack)

rnn = nn.RNN(feature_dim, hidden_size, n_layers, batch_first=True) 
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward 
out, _ = rnn(pack, h0)

# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out, batch_first=True)
print(unpacked)

and the output:

torch.Size([4, 3, 1])

PackedSequence(data=Variable containing:
    1
    1
    1
    2
    2
    2
    3
[torch.FloatTensor of size 7x1]
, batch_sizes=[4, 2, 1])

Variable containing:
(0 ,.,.) = 
 -0.8313 -0.7238
 -0.9355  0.3213
 -0.9907 -0.0606

(1 ,.,.) = 
 -0.8365 -0.0670
 -0.9559 -0.0762
  0.0000  0.0000

(2 ,.,.) = 
 -0.2423 -0.1630
  0.0000  0.0000
  0.0000  0.0000

(3 ,.,.) = 
 -0.9419  0.0727
  0.0000  0.0000
  0.0000  0.0000
[torch.FloatTensor of size 4x3x2]

We can see that the last row of the second output is a zero vector, this is reasonable because we don’t intend to feed the PAD symbol into RNN.

Even_Oldridge · December 21, 2017, 5:12am

@smth I’d love to hear an answer to this question as well. Is it the case that there’s no backprop for padding tokens? And if not is it simply zeroing the gradient or is it more complex?

Can you or someone on the team fill us in and/or point to the relevant place in the source where this is handled?

cswhjiang · December 27, 2017, 8:57am

I have the same question.

jusjusjus · December 29, 2017, 5:29pm

Dear Ajay,

because you’re looking for the right way, here’s my take on how the pipeline from unordered sequences to packed sequences would look. One point though: It seems unnecessary memory usage to first pad the sequences is there a way around that?

import torch
import numpy as np
from torch.autograd import Variable

# unordered list (batch) of random sized arrays
batch_size = 4
max_length = 4
sequences = [
    np.random.randn(np.random.randint(1, max_length+1))
    for _ in range(batch_size)
]

# reverse ordered by `len`
ordered = sorted(sequences, key=len, reverse=True)
lengths = [len(x) for x in ordered]

# each element padded to `max_length`
padded = [
    np.pad(li, pad_width=(0, max_length-len(li)), mode='constant')
    for li in ordered
]

# Convert each array to `torch.Tensor`
tensors = [
    torch.from_numpy(ar)
    for ar in padded
]

# stack to matrix Variable
batch = Variable(torch.stack(tensors))
# add extra dim necessary to use with RNNs
# as pointed out by /u/kaushalshetty
batch = batch[:, :, None]

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch, lengths, batch_first=True)

cjlovering · January 17, 2018, 7:11am

Hidden state is h_t. c_t is the cell memory. For RNN and GRU there is only hidden state so for those it should be clear. (The hidden state is what is output-ed at each time step.)

kaushalshetty · January 26, 2018, 6:00pm

I think you missed
batch = batch.view(4,4,1)

jusjusjus · January 28, 2018, 4:56am

Which error do you get?

kaushalshetty · January 28, 2018, 8:02am

I got a size mismatch error when fitting it to a RNN. Printing pack will give a packed sequence of size 14 but it has to be 14,1 where 1 is the feature_dim right? Do correct me if I am wrong