Dataloader slower than normal execution?

gauthierb · October 16, 2020, 8:23pm

Hi,
I am trying to retrieve a sequence of bounding boxes to train an RNN, to do so I defined a Dataloader like this :

I load a sequence of tensors, where tensors are the bounding boxes that have been resized to fit VGG16, from memory which is a quick operation so I thought that doing a Dataloader like this would be quicker than loading an image then cropping it to make a bounding box than resizing it for every call to get_item. The problem is that for some reason getting a batch takes approx. 10 seconds with num_workers =1 and ±2 minute with num_worker =10.
So I tried to manually make a batch by doing :

batch =
for i in range(64):
det_frame_list = traindata[i]
label = trainlabel[i]
bbs =
for j in range(len(det_frame_list)):
bbs.append(torch.load(det_frame_list[j]))
tensor_ret = torch.stack(bbs)
batch +=[tensor_ret]
batch = torch.stack(batch)

(don’t know how to use BBCode sorry!)

to see if my approach was slow but it instantly finishes with good output.
Since I am new to Pytorch I can’t figure out why this (Dataloader taking ages to make a batch) is happening, any help understanding this would be greatly appreciated!
Thanks