How to use LSTM for my use case?

I have a dataset having 11k rows and 15 columns.
It’s a time series data, each row corresponds to one event in time, there could be multiple events in single day. Hence when creating data loader I did not use shuffle. I have a classification task with 7 classes. I used deep net for it and it didn’t give that good results and read that LSTM could be helpful as it has memory element. I wish to use LSTM in my model and hope that based on past data it captures the relationship and help me in predicting the classes better.

My question

  1. what should be the seq length value in LSTM layer? Should it be 1? Corresponding to each row?
    2)Also should I be initialising hidden state and cell state in each forward pass or use last computed hidden state and cell state value for next forward pass?
  2. I read that weights are shared across timestep but since a batch of data is fed at a time how do these weights get shared?
  1. number of events. “seq length” is never explicitly specified as RNNs work in “one step at a time” mode. you can even batch sequences of varying lengths with nn.utils.rnn.pack_padded_sequence. In general, data loader for RNNs is a bit tricky (as you may want to sample timeseries and/or subsequences, depending on task).
    1b. you don’t pass states across iterations, as new mini-batch implies drawing a new set of samples (timeseries or random start points)
  2. same as with other NNs, weights are not sample specific, but transform some data across batch dimension the same way. Sharing across timesteps additionally implies that you can make timesteps with new events for as long as you need.
1 Like

Currently I’m saving the hidden and cell state from one forward pass into a list and then for next forward pass using those hidden and cell States as initial state to LSTM. Is this correct? I hope that this will help capture the dependency across rows in my dataset could someone confirm?

Thank you so much for your answer, it seems I wrote my reply at same time as you (what a coincidence).

So take-away here is that seq length doesn’t actually matter and basically since we can’t sometimes take huge sequence at once in memory we have this seq length parameter?

I’m sorry it’s just bit hard for me to understand this concept of LSTM properly, don’t mind me. So what I think right now is that is the hidden state, cell state after end of 1 sequence is fed as initial hidden state and cell state for sequence 2 in same batch??
Weight updates happen in batches right? I’m struggling to visualize how the long term dependency is maintained when we use batches.

You said I shouldn’t pass states across iterations but I hoped that my data point r (row number r) can maybe make better prediction if model has seen prediction for all rows before r. Hence I currently am passing hidden state and cell state across each forward pass (ie across batches right)
Could you shed some clarity on this please

Batches are mainly a way to accelerate training, think about using batch_size=1.

You start from a blank state, accumulate a variable number of events into a fixed size state, and then use it for classification or whatever.

If that’s unclear, 3d tensors with additional time dimension are used to feed events in, instead of “batches of events”.

Your “state preservation” approach, even if it works, won’t allow making out-of-sample predictions.

I guess you refer to “seq length” in some data loader implementation, then this is mostly correct. You also don’t treat your data as one “huge sequence” because a model trained this way won’t generalize to other sequences.

1 Like

I thank you so much for the reply. I have been trying to understand sequences properly and how you actually prepare the data input/label for sequence modelling tasks and now I have bit more clarity. I have been mainly debugging sine wave prediction. Regarding this now I have more questions which I will open a new topic, hope I get to understand the concepts better.