I have a dataset having 11k rows and 15 columns.
It’s a time series data, each row corresponds to one event in time, there could be multiple events in single day. Hence when creating data loader I did not use shuffle. I have a classification task with 7 classes. I used deep net for it and it didn’t give that good results and read that LSTM could be helpful as it has memory element. I wish to use LSTM in my model and hope that based on past data it captures the relationship and help me in predicting the classes better.
- what should be the seq length value in LSTM layer? Should it be 1? Corresponding to each row?
2)Also should I be initialising hidden state and cell state in each forward pass or use last computed hidden state and cell state value for next forward pass?
- I read that weights are shared across timestep but since a batch of data is fed at a time how do these weights get shared?