I am trying to implement an LSTM model to predict the stock price of the next day using a sliding window. I have implemented the code in keras previously and keras LSTM looks for a 3d input of (timesteps, (batch_size, features)). I have read through tutorials and watched videos on pytorch LSTM model and I still can’t understand how to implement it. I am going to make up some stock data to use as example so we can be on the same page. I have a tensor filled with data points incremented by hour (time) eg.
data_from_csv = [[time, open, close, high, low, volume], [1,10,11,15,9,100], [2,11,12,16,9,100], [3,12,13,17,9,100], [4,13,14,18,9,100], [5,14,15,19,9,100], [6,15,16,10,9,100]]
if my window size is 2 then I would take batches like:
giving me 5 batches of (2,6). in keras that would be an input of (5,(2,6)) where 5 is the samples or number of timesteps. 2 is the batch/window length and 6 is the number of features. I don’t think I fully understand what pytorch expects for each input/parameter.
in the pytorch docs: nn.LSTM the parameters are:
input_size: the number of expected features
In keras that would be [time, open, close, high, low, volume] or an input_size of 6 different data labels per timestep.
hidden_size: number of features in the hidden state
I take this as how many lstm cells are in the hidden layer(s) and how many outputs the first layer will have.
num_layers: number of recurrent layers
I take this as the number of hidden LSTM layers. so if num_layers = 1 then I will have an input, hidden layer, and output layer
There are more parameters listed but then it describes the input as:
where input is of shape(seq_len, batch, input_size)
the docs then go on to use the example:
rnn = nn.LSTM(10, 20, 2) input = torch.randn(5, 3, 10) h0 = torch.randn(2, 3, 12) c0 = torch.randn(2, 3, 12) output, (hn, cn) = rnn(input, (h0, c0))
from what I can tell this is what it would look like with the parameter names substituted into the values:
rnn = nn.LSTM(input_size, hidden_size, num_layers)
when they instantiate the “input” variable, only one number matches so I assume the “10” represents columns in a data set as features. so If I were to use pytorches example with my example data_from_csv above, with 2 hidden layers that accept double the input features, would it look as follows?
rnn = nn.LSTM(6, 12, 2) input = np.ndarray([[[1,10,11,15,9,100], [2,11,12,16,9,100]]]) h0 = torch.randn(2, 1, 12) c0 = torch.randn(2, 1, 12) output, (hn, cn) = rnn(input, (h0, c0))
when pytorch asks for “seq_length” is it asking for how long the window/batch is or how many batches are in the dataset?
when pytorch asks for “batch” is it asking for the whole window of data?
when pytorch asks for “features” is it talking about number of columns of data like time, open, close etc.? is that the same as “input_size”
The more I look at this the less I think I understand it. Let me know if I’m on the right track and please define each term based on my stock example data so I can understand. I almost guarantee I will have some clarifying questions but I cannot find a good tutorial or example that uses multivariate input and any help would be appreciated. Thanks.