Understanding pack_padded_sequence and pad_packed_sequence

matthew_zeng · September 1, 2017, 12:18am

Right, you don’t have to use pack_padded_sequence. Padding is fine, but it is different from using pack_padded_seq. For packed input, RNN will not perform calculation on pad elements.

For example, you have a padded mini batch (size 2), zero is padding.

1 1 1
1 0 0

The output will be 3 (seq length) x 2 (batch size). However, packed input will result in a packed output contains (3 x 1 and 1 x 1). If you feed pack into RNN, it will not calculate output for your pad element. Moreover, hidden will be the hidden after the last valid input instead of hidden after the last zero padding (if you feed pad into rnn, hidden will be in the case).

RNN actually does not distinguish pad and valid elements, and it performs the same calculation on them. You may need to clean the output (e.g., mask output) to get the result you want. Dynamic RNN (feed with packed input) does not have this problem.