How to input image sequences to a CNN+LSTM?

The CNN will only look at each image separate from the others, so it won’t look at them in a sequence in any way unless you would concatenate them in some way, like along the channels or something like that. It’s a little unclear what you’re trying to accomplish but from my understanding you want the CNN to do some feature extraction from the images and then you can use the lstm on those features.

So what I’m thinking you could do is what you said which is process each image separate by doing x.view(batch_size*4, 3, H, W) and then when each image has been processed by the CNN you will have shape (batch_size*4, C, H_new, W_new) and you can reshape this to the lstm by doing x.view(batch_size, 4, -1) if you’re using batch_first=True for the lstm. The lstm will then process them in a sequence.

5 Likes