How to load a large video dataset?

Hello everyone,

I have a dataset of multiple videos, consisting of ~40,000 frames. I am trying to feed every video as one batch (batch_size=1) to a recurrent network for a regression task. In the custom dataloader function, I read all the preprocessed frames of one video at once, and expectedly, GPU memory cannot handle it and besides, data loading can take a long time.

  1. Any suggestions on how to load this type of datasets into GPU, without having memory and time problems?

I thought about feeding a part of the frames and saving the hidden outputs, then restart the network with the hidden outputs as hidden inputs now and a new set of the frames. For example if the GPU is capable of processing 5k frames at a time, a 40k frame long network will be started for 8 times, and at the end the prediction will be generated.

  1. I’m confused about how I can do this, regarding backprop and again, dataloader. Is there a way like this?

Thanks in advance for answering one or both of my questions! (first has higher priority :slight_smile:)

This might be helpful to you Video Dataset Loading in PyTorch.

Its a custom dataset implementation that goes very well with any video dataset and is very fast. It also allows you to specify the amount of frames to load from each video, taking the frames evenly from start to finish of a video. This way you might be able to make your loaded frame tensor a bit sparser in terms of the amount of frames it contains from the video, and allow yourself to not overrun memory