Reduce Idleness Between Batch Loads

12 workers supply the GPU with data. When doing inference, the GPU is shows:

  • 100% GPU utilisation
  • all GPU RAM is used

HTop shows more-or-less that:

  • none of the CPUs are bottlenecked.

But in between batch loading, there are moments when the GPU % drops to zero.

I suspect this is the main reason for low GPU utilisation.

Is there any way to reduce this idleness?

I have many more avenues to improve performance, but this seems blindingly obvious, but I am unsure how to improve this.


Can you describe the structure of the batch and the size of the tensors?
I assume you have memory pinning enabled and non_blocking=True?


Here are some more details:



Collate function:


non_blocking etc.:


img size (actually the stack of images): torch.Size([2010,3,256,256])

OK, I realised I didn’t read the workers argument correctly! All good now.


1 Like