Hi there, I’m new to pytroch (and the community!).
Sorry in advance if this is a silly question but as I’m getting my feet wet with LSTMs and learn pytorch at the same time I’m confused about how nn.LSTM ingests its inputs. From the main pytorch tutorial and the time sequence prediction example it looks like the input for an LSTM is a 3 dimensional vector, but I cannot understand why.
At the end of this thread it is mentioned that the three elements of the input are time dimension (5), feature dimension (3) and mini-batch dimension (100). The suggested 3d tensor is 5 x 100 x 3, which corresponds to time x batch x features.
I’m having trouble understanding this.
If I have a single time series of length 10000, but use mini batches of, say, length 50, and have no feature inputs other than the series itself, would this translate to a ? x 50 x 1 tensor?
What is referred as “time” in the thread above?
Is the batch size the size of the sliding window of observations that I will use to predict the “next” observed datapoint?
I think related to my misunderstanding is the fact that the time sequence prediction example seems to be predicting multiple sine waves for each point in time, and in this thread@spro says that “any kind of continued sequence prediction counts as many-to-many”, which just adds to my confusion.
I feel my questions all stem from a misunderstanding of something very basic and connected.
The minibatch dimension refers to the number of sequences you want to process in parallel. So if you divide a time series of length 10000 into chunks of length 50, your input tensor would be 50 (timesteps) by 200 (batch size) by 1 (features).
That’s very helpful thanks for taking the time to answer. In terms of performance, are there any best practices in the relationship between timesteps and batch size (high/low, low/high, etc)?
Let’s say we have some time series data in a 2D dataframe, 10,000 rows, 1 data column, i.e. 10k x 1.
To turn this into batches for torch.nn.LSTM, we make this into 200 batches, so in 2D form that’s 200 dataframes each with dimension 50 x 1.
Since torch.nn.LSTM needs a 3D tensor, we reshape this frame to dimension 50 x 200 x 1 and use this entire 3D tensor as the input for LSTM's forward function.
@zhidali Input needs to be a 3d tensor with dimensions (seq_len, batch_size, input_size), so: length of your input sequence, batch size, and number of features (if you only have the time series there is only 1 feature). If you train with a batch of size 1 the input tensor would be 50x1x1. I’m also learning but I think its accurate
Hi Alex, thanks for the reply.
These two days I figure out my model. 200 batch, each batch is 50x1x1.
I set my batch size to 1, but I still can control the sequence length, I use DataLoader and it can help me to diivide my data in serveral batch, when I train, I just set the sequence length is the same number as the batch size in dataLoader.
Just for reference about the batch size problem, here’s a recent OpenAI post/paper about just that. TL;DR You wanna get the gradient noise scale / batch size near 1.