Why is packing/unpacking sequence slower than unpacked sequence?

As several articles suggest online packing a padded sequence before passing it to RNN can save compute as it doesn’t require extra computation of the timestamps that are padded.

On basis of that I’m assuming that this implies, that using pack_padded_sequence should be definitely faster than forward passing the whole sequence including the pads. However I’m not observing that on my actual dataset. I have created fake data to reproduce the problem.

Please let me know if I’m missing something here.