Improve LSTM Training Speed

blueeagle · January 24, 2022, 7:48am

Hi everyone,

I have implemented a simple Many-to-One LSTM Encoder-Classifier.

The model takes a packed sequence as input (as my input data has variable length) and outputs the probabilities for the target classes. The input sequences are rather long (about 3000 data points).

I am running the training on a 16“ MacBook Pro (6-Core Into Core i7, AMD Radeon Pro 5300M 4 GB) but unfortunately it seems that the training is extremly slow (up to 45 minutes per epoch).

As I have never worked with recurrent neural networks (or LSTMs) before, it is hard for me to determine if this is a usual duration for the back propagation through time or if there is something wrong with my implementation.

Can anybody please help me?

Here is my model:

class MyModel(nn.Module):
    def __init__(self, input_features, output_features, n_classes):
        super(MyModel, self).__init__()
            self.encoder = nn.LSTM(input_features, output_features, 1, batch_first=True, dropout=0, bidirectional=False)
            self.cla = nn.Linear(output_features, n_classes)

    def forward(self, x: torch.nn.utils.rnn.PackedSequence) -> Tuple:
        z, _ = self.encoder(x)
        z_unpacked, lens_unpacked = pad_packed_sequence(z, batch_first=True)
        last_elements = z_unpacked[torch.arange(z_unpacked.shape[0]), lens_unpacked - 1]
        y = self.cla(last_elements)
        y= F.softmax(y, dim=1)
        return y

Carlos_Segura · January 31, 2022, 5:19am

What do you mean by “3000 data points”? is this the sequence length for a single input? Maximum sequence length?
About your setup, which device is available in your PyTorch installation? e.g., CPU or GPU.

I have not tried your model, but it looks simple and shouldn’t be the bottleneck. However, If your input description “… sequences are rather long (about 3000 data points).” implies “sequence-length.” Using RNN’s with sequences this long and the limited compute of your MacBook is the most likely reason it is slow.

Perhaps your problem here is how you’re processing your inputs, e.g., pre-processing, padding, batching. I suggest optimizing your data pipeline to fit the computing capabilities of your MacBook.

blueeagle · January 31, 2022, 2:37pm

First of all thanks for your answer.

Yes the sequence length is about 3000 for all input samples
I cannot run Pytorch on the GPU, so the device available is a 2,6 GHz 6-Core Intel Core i7

As I never worked with packed sequences in PyTorch, I was not sure if the packing and unpacking of the sequences is the bottleneck of the training. However, as I now tested my pipeline on another dataset with much shorter sequence length (and it worked much faster), I assume that my implementation is alright.

Thanks for your help!