Why do we need to pad the input for variable length sequences for lstm when there is a pack_padded_sequence function that essentially tells the lstm to ignore the padded portion? Why isn’t the pack_padded_sequence function with the sequence lengths sufficient for training via mini batches? Why do we need to pre-pad the input ?
Why isn't pad_packed_sequence sufficient ? Why do we need to pad by ourselves when we supply the sequence lengths?
hopefully we’ll get an answer some day.