LSTM Input after Spatial CNN

I am working on a model that has a number of frames input (nBCHW) and outputs a steering angle prediction. The input goes through slice-by-slice conv then through LSTM layer to smooth the prediction. The LSTM input is 3D (seq_lenght, batch, input_size) so before feding the 3D tensor to LSTM i managed to have (1, nB, input_size = CH*W). I am kinda confused about the seq_lenght what should it mean in such architecture when i want to capture the temporal info between frames with LSTM?

the squence length here should be your nB, and the batch for LSTM should be 1, because you have only 1 sequence of length nB.

1 Like

Thank you so much for the insight! I Changed it and the loss stopped oscillating and start to decrease!