Speed of Multi-layer LSTMs using PackedSequence

jinserk · September 13, 2018, 7:04pm

Hi all,

I have a multi-layered LSTMs, and I expected a faster training if I use a packed sequence instead of padded tensors of the longest sequence length. However, a comparison shows the padded input is slightly faster than the packed one. In my understanding, the packed input runs the loops in the layers lesser. doesn’t it?

Thanks!

TheShadow29 · September 13, 2018, 8:30pm

Do you have the code to run the experiments? I would be quite interested since I too am using packed sequence thing.

jinserk · September 13, 2018, 9:23pm

Here is a test code:

import time
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_sequence, pad_sequence

device = torch.device("cuda")

batch_size = 32
input_size = 100
hidden_size = 512
seq_len_range = (50, 200)
epoch = 10

rnn = nn.LSTM(input_size, hidden_size, num_layers=4, bias=True)
rnn.to(device)

inputs = list()
for i in range(epoch):
  seq = [torch.rand((np.random.randint(*seq_len_range), input_size)) for i in range(batch_size)]
  seq = sorted(seq, key=lambda x: x.size(0), reverse=True)
  inputs.append(seq)

# for padded input
start = time.time()
for i in inputs:
    x = pad_sequence(i).to(device)
    y = rnn(x)
end = time.time()
print(f"elapsed time for padded input: {end - start} secs")

# for packed input
start = time.time()
for i in inputs:
    x = pack_sequence(i).to(device)
    y = rnn(x)
end = time.time()
print(f"elapsed time for packed input: {end - start} secs")

The result was:

elapsed time for padded input: 0.6328160762786865 secs
elapsed time for packed input: 0.6410703659057617 secs

Interestingly, in cpu mode, the packed input is faster than the padded one:

elapsed time for padded input: 27.869688272476196 secs
elapsed time for packed input: 20.38231635093689 secs