Unmatched order of hidden states in batch processing

JaeJin_Cho · January 24, 2018, 3:53pm

Actually, I was confused about the shape of output after pad_packed_sequence. It was sequencelength, batchsize, feature dimension. Thanks. Now, it’s solved.

Hello.

I was looking at the output in RNN batch processing experiment with variable length sequences. For simplicitiy, I set the batch size 5 during the experiment.

I gave PackedSequence as an input to LSTM and got the corresponding outputs. Then, did a pad_packed_sequence to get the zero-padded output. However, I found a suspicious thing on the outputs (between “output” and “h_n” according to the official document). The order of the outputs did not match (“output” is in descending order while “h_n” is ascending order in terms of sequence length)

For example, the “output” sequences look like the array below (They are ordered in a descending way in terms of sequence length)

Variable containing:
(0 ,.,.) =
0.1831 0.0601 0.0732 -0.2483 0.0915
0.0179 -0.1223 -0.1160 0.0603 -0.0097
0.1965 0.1548 0.1344 -0.3234 0.2234
0.0126 -0.0113 -0.0182 -0.0683 0.1041
0.2266 0.0178 0.0531 -0.2717 -0.0327

(1 ,.,.) =
0.2787 0.0372 0.0821 -0.3525 0.0581
0.1780 -0.0718 -0.0946 -0.0953 -0.1266
0.1563 0.1133 0.0656 -0.2600 0.2187
0.1629 0.0827 0.0734 -0.3044 0.2054
0.0000 0.0000 0.0000 0.0000 0.0000

(2 ,.,.) =
0.2552 -0.0048 0.0069 -0.2237 0.0642
0.0956 -0.1890 -0.1316 0.0115 0.0096
0.1921 0.0276 -0.0050 -0.2566 0.1653
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000

(3 ,.,.) =
0.2368 -0.0836 -0.0937 -0.0731 -0.0004
0.2364 -0.0329 -0.0408 -0.2081 0.0208
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000

(4 ,.,.) =
0.3238 -0.0324 -0.0530 -0.2849 -0.1660
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000

However, the “h_n” output was ordered in ascending order in terms of sequence length.

Variable containing:
(0 ,.,.) =
0.3238 -0.0324 -0.0530 -0.2849 -0.1660
0.2364 -0.0329 -0.0408 -0.2081 0.0208
0.1921 0.0276 -0.0050 -0.2566 0.1653
0.1629 0.0827 0.0734 -0.3044 0.2054
0.2266 0.0178 0.0531 -0.2717 -0.0327

Doesn’t it affect the final result, if I only want to use the “output” at the last time step (that is, h_n) for my experiment? (If yes, how should the target labels be ordered to make the model work correctly since input batch was originally ordered in descending order in sequence length but during the process I got the h_n is in ascending order in sequence length)

Thanks, ahead!