GPU is idle while dataloader's subprocesses are running

output of top and nvidia-smi suggested that:

  • volatile gpu-util sticked almost zero while subprocesses of my dataloader was taking much CPU usage
  • once the subprocesses had disappeared, volatile gpu-utils got high, and epoch complete was confirmed next

it seems that the dataloader started to provide a batch just after finishing fetching all of the contents in dataset.
was this process really asyncronous or not? how can I do better?

my environment:

  • Ubuntu 18.04
  • python 3.7.3 installed by conda 4.6.11
  • pytorch 1.1.0
  • cuda 10.0.130
  • data: my own image dataset which give a pair of images(3x256x256) as a single input.
    data is placed as bunch of jpeg files and loaded by torchvision’s default loader

my dataloader had 18 workers and is set pin_memory = True, shuffle = True.

Thanks in advance,

1 Like