LSTM batch size vs sequence length

I am new to PyTorch and am trying to do some time series prediction for speech. The dimension of each datapoint is 384. What I am confused about is whether the memory of the LSTM is separate for each sequence in the batch or whether the batch is basically treated as one long sequence. In other words, in a simplified example, suppose that the input to our LSTM is (batch size, sequence length, input size). Now suppose we want our LSTM to retain memory of 50 datapoints. Is there a difference between having our input shape as (1,50,384) or (50,1,384). Does the latter mean that each datapoint is basically treated independently of the others?

Could someone please help with this. Thanks in advance!


Yes. Basically, transition is the same, but it is applied to different vectors (accumulated context + step input concatenated). What you refer to as “memory” is a transformation matrix, applied multiple times, roughly speaking.

1 Like