Understanding pack_padded_sequence and pad_packed_sequence


(Vijendra Rana) #1

Hi,
I have a problem understanding these 2 utilities. Not able to figure out what it does.

For eg. I was trying to replicate this with example from Simple working example how to use packing for variable-length sequence inputs for rnn

I have followed the pytorch documentation and coded with batch First

import torch
import torch.nn as nn
from torch.autograd import Variable

batch_size = 3
max_length = 3
hidden_size = 2
n_layers =1
num_input_features = 1
input_tensor = torch.zeros(batch_size,max_length,num_input_features)
input_tensor[0] = torch.FloatTensor([1,2,3])
input_tensor[1] = torch.FloatTensor([4,5,0])
input_tensor[2] = torch.FloatTensor([6,0,0])
batch_in = Variable(input_tensor)
seq_lengths = [3,2,1]
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
print (pack)

Here I get output as

PackedSequence(data=Variable containing:
 1
 4
 6
 2
 5
 3
[torch.FloatTensor of size 6x1]
, batch_sizes=[3, 2, 1])

I could retrieve the original sequence back if I do

torch.nn.utils.rnn.pad_packed_sequence(pack,[3,2,1])

which is obvious.

But can somebody help me to understand how and why we got that output ‘pack’ with size (6,1). Also the whole functionality, in general, I mean why we need these 2 utilities and how it is useful.
Thanks in advance for the help.

Cheers,
Vijendra


(Samarth Sinha) #2

They are used for seq to seq models with variable lengths. Such as a sentence can be of variable length, and to feed it into any class of an RNN, you need to be able to get the output at the right time step.

Furthermore, check your dimensions on, and compare them to the original link:

You should be getting a 3x3 PackedSequence, but you just have a small bug in the code, which is why you are getting a 6x1 PackedSequence.


#3

Actually the original example set a wrong shape for input, that’s why you get 6x1 instead of 3x3.

The original example set shape as [batch_size, 1, max_length], which is obviously wrong

Without packing, you have to unroll your RNN to a fixed length. Then, you will get a fixed length of output. The desired output should have different length, so you have to mask them by yourself.

if you feed your minibatch with packing, the output of each example will have different length, so you don’t have to mask something by yourself.


(Moonlightlane) #4

Hi there, thanks for the clarification! I also saw these two posts and compare then and can confirm the original post has the wrong dimension.

I am not sure if I understand how pytorch RNN operates, though: for example, I don’t necessary need to use pack_padded_sequence, correct? I can simply manually zero pad all sequences in a minibatch to the longest sequence, and then throw it into the RNN, which accepts input of dimension [seq_len, batch, input_size]? I think doing so (manually padding each sequence) is the same as use the pack_padded_sequence function, correct?

Thank you in advance for a further clarification!


#5

Right, you don’t have to use pack_padded_sequence. Padding is fine, but it is different from using pack_padded_seq. For packed input, RNN will not perform calculation on pad elements.

For example, you have a padded mini batch (size 2), zero is padding.

1 1 1
1 0 0

The output will be 3 (seq length) x 2 (batch size). However, packed input will result in a packed output contains (3 x 1 and 1 x 1). If you feed pack into RNN, it will not calculate output for your pad element. Moreover, hidden will be the hidden after the last valid input instead of hidden after the last zero padding (if you feed pad into rnn, hidden will be in the case).

RNN actually does not distinguish pad and valid elements, and it performs the same calculation on them. You may need to clean the output (e.g., mask output) to get the result you want. Dynamic RNN (feed with packed input) does not have this problem.


#6

Getting a similar issue here,

the inputs provided for pack_padded_sequence: sent, sent_len.
Where sent is the input (batch_size, seq_length, features/embedding_dim), with dimension [torch.FloatTensor of size 100x16x200]
and sent_len is a list containing the actual (unpadded) lengths of all sequences in the batch.

print sent.size() # (100L, 16L, 200L), i.e., (batch size, padded sequence length,embedding dimension / feature)
print len(sent_len) # 100, i.e., the length of the list containing sequence lengths of each element in the batch

sent_packed = nn.utils.rnn.pack_padded_sequence(sent, sent_len,batch_first=True)
print sent_packed

Now the output of sent_packed is expected to have a dimension same as sent, right?
I am getting the following:

PackedSequence(data=Variable containing:
 0.1084  0.3546  0.4458  ...  -1.1613  1.1618  0.4275
 0.0564 -1.0614  0.1452  ...  -0.7359 -0.2980 -1.9538
 0.8342  0.2849  0.5471  ...  -0.9297 -0.3760  0.4382
          ...             â±             ...          
 0.5107  1.6905  0.3308  ...  -0.8220  0.7505 -0.9616
 1.5038  0.3528 -1.4010  ...  -0.9663 -0.7744 -0.5839
-0.7513  1.6879 -0.1883  ...   1.1898  0.5734  0.1458
[torch.FloatTensor of size 818x200]
, batch_sizes=[100, 100, 100, 100, 100, 99, 77, 58, 35, 25, 11, 5, 3, 2, 2, 1])

Any idea what’s the bug?


(Suthee) #7

PackedSequence(data=Variable containing:
1
4
6
2
5
3
[torch.FloatTensor of size 6x1]
, batch_sizes=[3, 2, 1])

I thought the output is expected and correct.
The feature dimension is 1, so each row of 6x1 Tensor represents one token of the given sequence. There are 6 tokens total and 3 sequences. Then, batch_sizes = [3,2,1] also makes sense because the first iteration to RNN should contain the first tokens of all 3 sequences ( which is [1, 4, 6]). Then for the next iterations, batch size of 2 implies the second tokens out of 3 sequences which is [2, 5] because the last sequence has a length of 1.

Did I misunderstand something here? Thanks!


(Suthee) #8

I am new here so I might be wrong. I thought this is a correct behavior. There are 818 tokens/elements and each has 200 dimensions.

np.sum([100, 100, 100, 100, 100, 99, 77, 58, 35, 25, 11, 5, 3, 2, 2, 1]) is 818.