Every n = num_worker batch takes exceptionally long to process

I’ve written a data loader for the cityscapes and kitti dataset, both stored on HDDs, all preprocessing is on torch-tensors (only center crop, hr and vr flips. All done through t[:, top:bot, left:right] slices or torch.flips(t, [])).

I’ve tried PIL, cv2, and skimage-io for reading images, but there’s no difference in i/o speed.

I can see the GPU starving (usage = 0%) in smi. which happens every n=num_workers iteration.

is this a disk bottleneck or am I missing some optimization?

Problems characteristics

visible in cases when:

  • batch_size >= 4
  • num_worker >= 4
  • torch 1.0 stable
  • machine: google cloud compute (4 core >= skylake, 16gb, 1x p100)