Trying to understand PackedSequence and pack_padded_sequence

Jesper_Rix · September 25, 2017, 6:39am

Hi guys,

I’m new to PyTorch and i’m trying to grasp PackedSequence.

I have a batch of some sentences with variable length, which I wanna translate into PackedSequence, so i can feed them into an RNN.

The longest sentence in the batch is 11 words. Dimension of my word vectors are [1, 240] each.

I have padded each sentence to match the length of the longest sequence therefore my input Tensor is:

# batch_size, max_len, num_features
batch_in.size()
torch.Size([100, 11, 240])

packed_seq = nn.utils.rnn.pack_padded_sequence(batch_in, batch_in_lengths,batch_first=True)

And the output of packed_seq:

PackedSequence(data=Variable containing:
 3.1625e-01 -4.8614e-01  4.9205e-01  ...  -3.2214e-01 -1.0503e+00 -7.7392e-01
-2.3958e-01 -5.7964e-01 -1.1852e+00  ...  -4.7232e-02 -8.6795e-01  9.3440e-03
-2.3958e-01 -5.7964e-01 -1.1852e+00  ...  -4.7232e-02 -8.6795e-01  9.3440e-03
                ...                   ⋱                   ...                
-5.1679e-01  7.7076e-01  4.5970e-01  ...  -3.2855e-01  8.4874e-01  2.6562e-01
 7.2816e-01 -5.6678e-01 -1.5125e+00  ...   3.1464e-01 -6.4921e-01 -1.1999e+00
 1.2031e+00 -1.0457e-01  5.2758e-01  ...  -5.4365e-01  7.4296e-01  1.9522e-01
[torch.DoubleTensor of size 379x240]
, batch_sizes=[100, 100, 100, 42, 13, 10, 7, 3, 2, 1, 1])

Can somebody explain me why the batch_sizes looks the way it does?

chenyuntc · September 25, 2017, 11:10am

well, the batch_sizes here is not actually the batch_size of batch_in

packed_seq = nn.utils.rnn.pack_padded_sequence(batch_in.transpose(0,1), batch_in_lengths,batch_first=False)

you’ll get familiar batch size.

josmi9966 · February 4, 2018, 7:09pm

I also do not understand where the batch_sizes shown for the packed sequence come from.

Also, I would like to know how to best “unsort” the result of running the packed sequence through an LSTM and then unpacking: the unsorting would need to be done in a way to preserve the gradients and I am not sure how to do that best.

olegboiko · October 29, 2019, 5:37am

RNN will produce number of hidden states equal to the size of the longest input. Let’s assume we train RNN on sentences in some large corpus of text. If longest sentence in training data contains N words, then RNN will produce N hidden states.

Sentences fed into RNN in batches. But different sentences may have different lengths. Consider the following batch of 2 sentences:
Sentence 1: "I love cats"
Sentence 2: "I hope this answer helps"
(for brevity, I omit tensor-like representation of these sentences, with word embeddings etc).

This batch will produce 5 hidden states because 5 is the size of the longest sentence in the batch:
len(["i", "hope", "this", "answer", "helps"]) == 5

Now, RNN has to feed information into its first hidden state. It takes first elements from each sequence in batch and feeds them into the first hidden state: ["i", "i"].
Second hidden state will receive ["love", "hope"].
Third hidden state will receive ["cats", "this"].
But forth hidden state will receive just ["answer"] because first sequence "i love cats" ended at step #3.

batch_sizes is a list of sizes of inputs each hidden state receives. For this example, batch_sizes would be [2, 2, 2, 1, 1]. Each element in this list tells how many input elements go into corresponding hidden state of RNN.

I guess confusion stems from the fact that term batch is overloaded here. There is batch of sequences which goes into RNN, and then there is batch of inputs which goes into each step of the RNN.