I’ve written a data loader for the cityscapes and kitti dataset, both stored on HDDs, all preprocessing is on torch-tensors (only center crop, hr and vr flips. All done through t[:, top:bot, left:right]
slices or torch.flips(t, [])
).
I’ve tried PIL, cv2, and skimage-io for reading images, but there’s no difference in i/o speed.
I can see the GPU starving (usage = 0%) in smi. which happens every n=num_workers iteration.
is this a disk bottleneck or am I missing some optimization?
Problems characteristics
visible in cases when:
- batch_size >= 4
- num_worker >= 4
- torch 1.0 stable
- machine: google cloud compute (4 core >= skylake, 16gb, 1x p100)