Simple working example how to use packing for variable-length sequence inputs for rnn


(Ajay Talati) #1

Hi,

Updated - here’s a simple example of how I think you use pack_padded_sequence and pad_packed_sequence, but I don’t know if it’s the right way to use them?

import torch
import torch.nn as nn
from torch.autograd import Variable

batch_size = 3
max_length = 3
hidden_size = 2
n_layers =1

# container
batch_in = torch.zeros((batch_size, 1, max_length))

#data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])

batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3

batch_in = Variable(batch_in)

seq_lengths = [3,2,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)

>>> pack
PackedSequence(data=Variable containing:
 1  2  3
 1  2  0
 1  0  0
[torch.FloatTensor of size 3x3]
, batch_sizes=[3])


# initialize
rnn = nn.RNN(max_length, hidden_size, n_layers, batch_first=True) 
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward 
out, _ = rnn(pack, h0)

# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out)

>>> unpacked
Variable containing:
(0 ,.,.) = 
 -0.7883 -0.7972
  0.3367 -0.6102
  0.1502 -0.4654
[torch.FloatTensor of size 1x3x2]



James @jekbradbury, Adam @apaszke, does this look right to you guys? Can you help with clarifying the docs?

Be nice to have more simple working examples in the docs? That was one of the real joys of using torch, it was almost as easy to use as numpy :slight_smile:

Thanks a lot for your help :slight_smile:


Understanding pack_padded_sequence and pad_packed_sequence
How to handle variable length inputs (sentences)
Is there a "PyTorch-ic" way to do padding and batching?
Masking Recurrent layers
(Keith Yin) #2

the input of the RNN should be a Variable of shape [seq, batch, feature] or [batch, seq, feature] if batch_first=True according to the docs. the input in your code is [batch, feature, seq].


(Ajay Talati) #3

Hi Keith @KeithYin,

thanks a lot for spotting that! I’ve updated the example :blush:

Seems like the tedious thing with all this padding stuff is sorting the input vectors by length, before they go into the rnn, and then mapping the output back to the original indexing?

I’ll keep adding to the example :smile:

Cheers,

Aj


(itzjustricky) #4

When I run the simple example that you have provided, the content of unpacked_len is [1, 1, 1] and the unpacked variable is as shown above.

I expected unpacked_len as [3, 2, 1] and for unpacked to be of size [3x3x2] (with some zero padding) since normally the output will contain the hidden state for each layer as stated in the docs.

output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_k) from the last layer of the RNN, for each k.

Is PackedSequence only designed to help the programmer retrieve the last output (i.e. h_n tensor containing the hidden state for k=seq_len)?


#5

nn.LSTM, nn.RNN etc. all of these only give the last hidden state. Intermediate hidden states are not given or retrievable.


(itzjustricky) #6

I see. Thank you for clarifying my misunderstanding!


(Gopal Sharma) #8

How exactly variable length sequences are handled? How do you make sure that there is no backprop from sequences whose lengths are smaller than sequence of largest lengths?
Is it the case that, since output of the RNN for smaller sequence length becomes zero after their original lengths, gradients also become zero?


#9

batch_in size is still incorrect.

It should be [seq, batch, feature_size] if batch_first=True while batch_in is [seq, feature, batch] in your example.


(Nitish Gupta) #10

I have a followup question. After 1. sorting the input, 2. packing, 3. passing through lstm, 4. padding, how do you recover the original order back, i.e. the order before sorting.


(Houjing Huang) #11

It seems that the sorting procedure is done manually before hand, thus you have to record the order at the time if you need to use it at a later time.


(Houjing Huang) #12

Is it true that in your words ‘hidden state’ means the cell state? That is, it refers to the c in the following equations?

Thank you.


(Jonbean) #13

I really don’t see the purpose of using torch.nn.utils.rnn.pack_padded_sequence. If sorting, length calculation has already been handled outside using numpy already, then we can just feed the input as a tensor directly. What is the benefit of wrapping it?


(Yifan) #14

Agree.
The reason that the code can run without error is that batch_size is set to be equal to max_length. It won’t work if you change either of them. And the first parameter of nn.RNN should be input_size rather than the maximal sequence length. Besides, I would prefer writing vec_1 = torch.FloatTensor([[1], [2], [3]]) than vec_1 = torch.FloatTensor([[1, 2, 3]]) but here both are fine.

Here is a modified version:

batch_size = 4
max_length = 3
hidden_size = 2
n_layers =1
feature_dim = 1

# container
batch_in = torch.zeros((batch_size, max_length, feature_dim))

# data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
vec_4 = torch.FloatTensor([[2, 0, 0]])

batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in[3] = vec_4

batch_in = Variable(batch_in)
print(batch_in.size())
seq_lengths = [3,2,1,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
print(pack)

rnn = nn.RNN(feature_dim, hidden_size, n_layers, batch_first=True) 
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward 
out, _ = rnn(pack, h0)

# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out, batch_first=True)
print(unpacked)

and the output:

torch.Size([4, 3, 1])

PackedSequence(data=Variable containing:
    1
    1
    1
    2
    2
    2
    3
[torch.FloatTensor of size 7x1]
, batch_sizes=[4, 2, 1])

Variable containing:
(0 ,.,.) = 
 -0.8313 -0.7238
 -0.9355  0.3213
 -0.9907 -0.0606

(1 ,.,.) = 
 -0.8365 -0.0670
 -0.9559 -0.0762
  0.0000  0.0000

(2 ,.,.) = 
 -0.2423 -0.1630
  0.0000  0.0000
  0.0000  0.0000

(3 ,.,.) = 
 -0.9419  0.0727
  0.0000  0.0000
  0.0000  0.0000
[torch.FloatTensor of size 4x3x2]

We can see that the last row of the second output is a zero vector, this is reasonable because we don’t intend to feed the PAD symbol into RNN.


(Even Oldridge) #15

@smth I’d love to hear an answer to this question as well. Is it the case that there’s no backprop for padding tokens? And if not is it simply zeroing the gradient or is it more complex?

Can you or someone on the team fill us in and/or point to the relevant place in the source where this is handled?


#16

I have the same question.


(Justus Schwabedal) #17

Dear Ajay,

because you’re looking for the right way, here’s my take on how the pipeline from unordered sequences to packed sequences would look. One point though: It seems unnecessary memory usage to first pad the sequences is there a way around that?

import torch
import numpy as np
from torch.autograd import Variable

# unordered list (batch) of random sized arrays
batch_size = 4
max_length = 4
sequences = [
    np.random.randn(np.random.randint(1, max_length+1))
    for _ in range(batch_size)
]

# reverse ordered by `len`
ordered = sorted(sequences, key=len, reverse=True)
lengths = [len(x) for x in ordered]

# each element padded to `max_length`
padded = [
    np.pad(li, pad_width=(0, max_length-len(li)), mode='constant')
    for li in ordered
]

# Convert each array to `torch.Tensor`
tensors = [
    torch.from_numpy(ar)
    for ar in padded
]

# stack to matrix Variable
batch = Variable(torch.stack(tensors))
# add extra dim necessary to use with RNNs
# as pointed out by /u/kaushalshetty
batch = batch[:, :, None]

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch, lengths, batch_first=True)

(Charles Lovering) #18

Hidden state is h_t. c_t is the cell memory. For RNN and GRU there is only hidden state so for those it should be clear. (The hidden state is what is output-ed at each time step.)


(Kaushal Shetty) #19

I think you missed
batch = batch.view(4,4,1)


(Justus Schwabedal) #20

Which error do you get?


(Kaushal Shetty) #21

I got a size mismatch error when fitting it to a RNN. Printing pack will give a packed sequence of size 14 but it has to be 14,1 where 1 is the feature_dim right? Do correct me if I am wrong