I need to train a CNN-RNN model together for feature extraction of a video

I am a little how to build a custom data loader which can provide me sequential images. Would be really helpful if someone could help me out.