Is there any case of training RNN without packed sequence?

As title, Is there any real use case of training RNN without using packed_sequence?
I wonder that if batch size > 1, then there will be training samples with different sequence length. Therefore, padding is required. Is there any application of training RNN by simply feeding the padded tensor into the RNN without using pack_padded_sequence still work?


Using pack_padded_sequence when having batches with sequences of different lengths is not absolutely mandatory. Just try it with and without. In many cases – I talking mainly about classification here!!! – I don’t think you will see major differences in the results.

Ideally, the network will learn to kind of ignore the special padding word (e.g., <pad>). Some affects are probably always gonna be in practice.

How important using packed_padded_sequence is will most likely depend on the training data. Most notably, batches where the lengths of the sequences vary greatly and/or are very skewed (99 sequences with around 10 items and 1 sequence with 100 items, so the other 99 sequences have to be padded a lot) will arguably have the most issues. That’s why existing solutions try to ensure that batches are relatively homogeneous:

  • The BucketIterator of torchtext defines an iterator that batches sequences of similar lengths together. This minimizes amount of padding needed while producing freshly shuffled batches for each new epoch.
  • So this older post where we came up with an iterator that ensures that each batch contains sequences of the same length. While this might yield batches that are not full, for large datasets this issues is absolutely negligible.

Summing up:

  • Padding is not necessarily bad (i.e., packing is not necessarily needed)
  • Simply try with or without padding to see the effects.
  • Try approaches to automatically minimize the required padding.

I hope that helps.