Beginner here so please bear with me. I’m adapting this LSTM tutorial to predict a time series instead of handwritten numbers.
In the original problem (using MNIST) there are 60000 28 * 28 images that are used to train the network. These get reshaped into a 28 * 60000 * 28 tensor to be ingested by the model.
My original data is a one dimensional time series with shape (40000, ). With a batch size of 20 I reshape to (5, 8000, 1) tensor corresponding to (timesteps, batches, features).
I’m trying to build an LSTM that takes 5 timesteps and predicts the “next” one, using a hidden layer of dimension 128 (5 --> 128 --> 1), but I’m getting a mismatch when I run the code. I can solve the problem but I don’t quite get what is going on.
I’m getting the following error:
RuntimeError: size mismatch, m1: [20 x 1], m2: [5 x 512]
20 is the batch size I defined
1 is the sequence length (number of features, only time-series itself at this point)
5 is the number of timesteps
512 is 128 * 4 but not sure where this comes from (why have 4 times the dimension of the hidden layer?)
So obviously if I change the sequence length to 5 it works, but I’m confused because then I would have an input tensor with shape (5, 1600, 5) and not the desired (5, 8000, 1).
The new shape doesn’t seem right because I want to take 5 data points to predict the 6th, so it should be a 5 x 1 vector that maps to an scalar, not a 5 x 5 grid that maps to a scalar (like the 28 x 28 grid in the original MNIST code).
What am I not getting?
Thanks for any insight.

