TLDR: I am unable to fully utilize GPU because a lot of time is spent loading images. What are some strategies to always keep the GPU utilized?
Say I have a batch of size 8. Each element in the batch is a video folder with Y number of frames, where Y is between 90 and 400. The number of frames differs in each video folder, so in a given batch I try to use video folders with similar number of frames to minimize padding. The images are usually resized to 224x224. The total training data size is greater than system RAM. I use DataLoader
with num_workers > 1
and then transfer the batch tensor to CUDA
. Right now I am using 1 GPU.
I’ve tried to tune num_workers
and got some speed up. Regardless, the GPU is not fully utilized. There is delay of a few seconds when new batch is created.
What are approaches I can take? I think creating a forward looking queue of 2+ batches would ensure GPU always has 1 batch available. Can this be done in conjunction with DataLoader
?