LSTM for images with more than one channel

I’m trying to use a vanilla LSTM to model video frame sequences as a benchmark approach.

I have the following image data set with the following input dimensions (batch_size=256, channels=2, height=64, width=64), I want it to match the following LSTM input format (sequence_length, batch_size, input_size)

I wanted to pass in the input images row by row

Here is my implementation of the LSTM i’m using