Longer running time when using pack_padded_sequence


I have implemented the pack_padded_sequence version of the SNLI classifier based on the code from examples: https://github.com/pytorch/examples/tree/master/snli.

My code is here: https://github.com/OanaCamburu/SNLI

The running time for an epoch (all rest being equal) is about 4 times higher, while performance has essentially remained the same.

For the longer running time, I suppose it’s because there is more switch between CPU-GPU due to the necessary sortings (https://github.com/OanaCamburu/SNLI/blob/master/train_packedseq.py#L20 and https://github.com/OanaCamburu/SNLI/blob/master/train_packedseq.py#L36) being executed on the CPU. Is it possible to make the code more efficient? Is there any intention to add a pack_padded_sequence attribute to RNNs in future releases?

For the non-improvement in performance, I know this wasn’t guaranteed but I still expected a bit of an increase. Curious if anyone has experience same behaviour in other tasks.


1 Like