I have a dataset of multiple videos, consisting of ~40,000 frames. I am trying to feed every video as one batch (batch_size=1) to a recurrent network for a regression task. In the custom dataloader function, I read all the preprocessed frames of one video at once, and expectedly, GPU memory cannot handle it and besides, data loading can take a long time.
- Any suggestions on how to load this type of datasets into GPU, without having memory and time problems?
I thought about feeding a part of the frames and saving the hidden outputs, then restart the network with the hidden outputs as hidden inputs now and a new set of the frames. For example if the GPU is capable of processing 5k frames at a time, a 40k frame long network will be started for 8 times, and at the end the prediction will be generated.
- I’m confused about how I can do this, regarding backprop and again, dataloader. Is there a way like this?
Thanks in advance for answering one or both of my questions! (first has higher priority )