Speed up image loading in CPU before transferring to GPU

TLDR: I am unable to fully utilize GPU because a lot of time is spent loading images. What are some strategies to always keep the GPU utilized?

Say I have a batch of size 8. Each element in the batch is a video folder with Y number of frames, where Y is between 90 and 400. The number of frames differs in each video folder, so in a given batch I try to use video folders with similar number of frames to minimize padding. The images are usually resized to 224x224. The total training data size is greater than system RAM. I use DataLoader with num_workers > 1 and then transfer the batch tensor to CUDA. Right now I am using 1 GPU.

I’ve tried to tune num_workers and got some speed up. Regardless, the GPU is not fully utilized. There is delay of a few seconds when new batch is created.

What are approaches I can take? I think creating a forward looking queue of 2+ batches would ensure GPU always has 1 batch available. Can this be done in conjunction with DataLoader?

If you use num_workers > 1 you are already creating multiple workers, where each worker will load a batch and push it to the queue.
If you are still seeing a slowdown in loading the data, it would mean that your current data loading pipeline is still not fast enough.
This post gives some insights why data loading can be the bottleneck and some best practices.

1 Like

Thanks @ptrblck. The link is useful.