Video Dataloading

I have a dataset of videos, but i want to load them into a CNN frame by frame and analyze them that way (temporal part is being handled by an LSTM). Would I need to make a custom dataset to make that work, or is there another option that works better? If I need to make a custom dataset, how would I handle the conversion from a video to an array of images?