Train LSTM with multiple sequences per parameter update

mitchellg · August 6, 2020, 2:47am

I’m training a network to perform time-series forecasting. My current implementation works but is looping through my sequences 1 at a time, yielding a parameter update per input sequence per epoch. I want to mini-batch train so there is a parameter update after evaluating 32 sequences (to hopefully improve training time and reduce noise) but I can’t figure out how to implement the DataLoader / training loop to do this. I would like to use the DataLoader so I can shuffle the training data.

NOTE: I am not talking about mini-batching the sequences themselves. Each LSTM input is still a tensor of size [100, 1, 8] representing [sequence length, sequence mini-batch size, number of features]. I want to train on 32 of these LSTM inputs per parameter update.

Maybe I’m misunderstanding something or I’m missing a simple solution.

Any help is greatly appreciated!

vdw · August 6, 2020, 7:12am

I’m probably bit confused, but don’t you simple want batches of size 32 with a shape (seq_len, batch_size, input_dim), which would be (100, 32, 8) in your case? This is essentially the standard way to feed an LSTM.

What exactly is stopping you?

mitchellg · August 7, 2020, 2:44am

I appreciate the reply. I actually misunderstood what the mini-batch dimension in LSTM was doing. You’re correct. Thanks!