Simple working example how to use packing for variable-length sequence inputs for rnn


#9

batch_in size is still incorrect.

It should be [seq, batch, feature_size] if batch_first=True while batch_in is [seq, feature, batch] in your example.


(Nitish Gupta) #10

I have a followup question. After 1. sorting the input, 2. packing, 3. passing through lstm, 4. padding, how do you recover the original order back, i.e. the order before sorting.


(Houjing Huang) #11

It seems that the sorting procedure is done manually before hand, thus you have to record the order at the time if you need to use it at a later time.


(Houjing Huang) #12

Is it true that in your words ‘hidden state’ means the cell state? That is, it refers to the c in the following equations?

Thank you.


(Jonbean) #13

I really don’t see the purpose of using torch.nn.utils.rnn.pack_padded_sequence. If sorting, length calculation has already been handled outside using numpy already, then we can just feed the input as a tensor directly. What is the benefit of wrapping it?


(Yifan) #14

Agree.
The reason that the code can run without error is that batch_size is set to be equal to max_length. It won’t work if you change either of them. And the first parameter of nn.RNN should be input_size rather than the maximal sequence length. Besides, I would prefer writing vec_1 = torch.FloatTensor([[1], [2], [3]]) than vec_1 = torch.FloatTensor([[1, 2, 3]]) but here both are fine.

Here is a modified version:

batch_size = 4
max_length = 3
hidden_size = 2
n_layers =1
feature_dim = 1

# container
batch_in = torch.zeros((batch_size, max_length, feature_dim))

# data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
vec_4 = torch.FloatTensor([[2, 0, 0]])

batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in[3] = vec_4

batch_in = Variable(batch_in)
print(batch_in.size())
seq_lengths = [3,2,1,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
print(pack)

rnn = nn.RNN(feature_dim, hidden_size, n_layers, batch_first=True) 
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward 
out, _ = rnn(pack, h0)

# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out, batch_first=True)
print(unpacked)

and the output:

torch.Size([4, 3, 1])

PackedSequence(data=Variable containing:
    1
    1
    1
    2
    2
    2
    3
[torch.FloatTensor of size 7x1]
, batch_sizes=[4, 2, 1])

Variable containing:
(0 ,.,.) = 
 -0.8313 -0.7238
 -0.9355  0.3213
 -0.9907 -0.0606

(1 ,.,.) = 
 -0.8365 -0.0670
 -0.9559 -0.0762
  0.0000  0.0000

(2 ,.,.) = 
 -0.2423 -0.1630
  0.0000  0.0000
  0.0000  0.0000

(3 ,.,.) = 
 -0.9419  0.0727
  0.0000  0.0000
  0.0000  0.0000
[torch.FloatTensor of size 4x3x2]

We can see that the last row of the second output is a zero vector, this is reasonable because we don’t intend to feed the PAD symbol into RNN.


(Even Oldridge) #15

@smth I’d love to hear an answer to this question as well. Is it the case that there’s no backprop for padding tokens? And if not is it simply zeroing the gradient or is it more complex?

Can you or someone on the team fill us in and/or point to the relevant place in the source where this is handled?


#16

I have the same question.


(Justus Schwabedal) #17

Dear Ajay,

because you’re looking for the right way, here’s my take on how the pipeline from unordered sequences to packed sequences would look. One point though: It seems unnecessary memory usage to first pad the sequences is there a way around that?

import torch
import numpy as np
from torch.autograd import Variable

# unordered list (batch) of random sized arrays
batch_size = 4
max_length = 4
sequences = [
    np.random.randn(np.random.randint(1, max_length+1))
    for _ in range(batch_size)
]

# reverse ordered by `len`
ordered = sorted(sequences, key=len, reverse=True)
lengths = [len(x) for x in ordered]

# each element padded to `max_length`
padded = [
    np.pad(li, pad_width=(0, max_length-len(li)), mode='constant')
    for li in ordered
]

# Convert each array to `torch.Tensor`
tensors = [
    torch.from_numpy(ar)
    for ar in padded
]

# stack to matrix Variable
batch = Variable(torch.stack(tensors))
# add extra dim necessary to use with RNNs
# as pointed out by /u/kaushalshetty
batch = batch[:, :, None]

# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch, lengths, batch_first=True)

(Charles Lovering) #18

Hidden state is h_t. c_t is the cell memory. For RNN and GRU there is only hidden state so for those it should be clear. (The hidden state is what is output-ed at each time step.)


(Kaushal Shetty) #19

I think you missed
batch = batch.view(4,4,1)


(Justus Schwabedal) #20

Which error do you get?


(Kaushal Shetty) #21

I got a size mismatch error when fitting it to a RNN. Printing pack will give a packed sequence of size 14 but it has to be 14,1 where 1 is the feature_dim right? Do correct me if I am wrong


(Justus Schwabedal) #22

Yeah, I think input for all RNN-type modules need to have a filter/channel dimension, or however you’d wanna call it.


(Adi R) #23

I have not seen any examples handle padding/packing to compute the loss.

Suppose I have a tagger (i.e. for each input token I have an output label) can I use padded/packed sequence to compute the loss as well?


(Sherin Thomas) #24

Now that you have pack_sequence available in master (should be available in 0.4) you don’t have to worry about padding your input with zeros and call pack_padded_sequence

>>> import torch
>>> import torch.nn.utils.rnn as rnn_utils
>>> a = torch.Tensor([1, 2, 3])
>>> b = torch.Tensor([4, 5])
>>> c = torch.Tensor([6])
>>> packed = rnn_utils.pack_sequence([a, b, c])

But if you are only concerned about padding your sequence, you can youse pad_sequence

>>> import torch
>>> import torch.nn.utils.rnn as rnn_utils
>>> a = torch.Tensor([1, 2, 3])
>>> b = torch.Tensor([4, 5])
>>> c = torch.Tensor([6])
>>> rnn_utils.pad_sequence([a, b, c], batch_first=True)

 1  2  3
 4  5  0
 6  0  0
[torch.FloatTensor of size (3,3)]

(jpeg729) #25

With pytorch 0.3.1.post2

AttributeError: module 'torch.nn.utils.rnn' has no attribute 'pad_sequence'
AttributeError: module 'torch.nn.utils.rnn' has no attribute 'pack_sequence'

(Sherin Thomas) #26

Looks like my mistake, it is available in current master probably will be available in 0.4 release. Updated my answer!!


(Sitara J) #27

hi,I run your codes, and then I find some errors. The size of batch_in should be (batch_size,feature_dim,max_length).And I changed them,but it has a new error.
“dimension out of range (expected to be in range of [-1, 0], but got 1)”

I don’t know what does it mean,maybe you can try and tell me something about it,thank you !


(Sitara J) #28

When I run the simple example that you have provided, I run into the error
“dimension out of range (expected to be in range of [-1, 0], but got 1)”
Is there anybody has the same problem with me?Can someone tell me why and how to fix it?