Frame to frame prediction using Vanilla LSTM

I have the following frames form a video as images [batch_size=100,channels=2, image_height=64, image_width=64], a single batch consist of a sequence with 100 frames. I want to predict 1 frame ahead using a vanilla LSTM, I have transformed my input sequence as follows [batch_size, channels* image_height, image_width].

  1. How should I transform my data to be able to train such an architecture?