output of top
and nvidia-smi
suggested that:
- volatile gpu-util sticked almost zero while subprocesses of my dataloader was taking much CPU usage
- once the subprocesses had disappeared, volatile gpu-utils got high, and epoch complete was confirmed next
it seems that the dataloader started to provide a batch just after finishing fetching all of the contents in dataset.
was this process really asyncronous or not? how can I do better?
my environment:
- Ubuntu 18.04
- python 3.7.3 installed by conda 4.6.11
- pytorch 1.1.0
- cuda 10.0.130
- data: my own image dataset which give a pair of images(3x256x256) as a single input.
data is placed as bunch of jpeg files and loaded by torchvision’s default loader
my dataloader had 18 workers and is set pin_memory = True, shuffle = True
.
Thanks in advance,