Hi,
I have noticed that my dataloader gets slower if I add more workers compared to num_workers=0.
My dataset definition is quite simple:
class Dataset(torch.utils.data.Dataset):
def __init__(self, file_paths, labels, transform=None):
self.file_paths = file_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.file_paths)
def __getitem__(self, index):
path_img = self.file_paths[index]
image = accimage.Image(path_img)
target = self.labels[index]
if self.transform is not None:
image = self.transform(image)
return image, target
file_paths and labels can be quite large (above 1 million entries each). So to my understanding, the dataloader would push the dataset onto a worker every time it is indexed? Could it be slow because this creates an overhead?
If this is indeed the reason for slow performance, how could i work around that? Would it be possible to only parallelize loading and transforming an image where I send a path, the label and the transforms to a worker?
Sorry if this just reveals severe misconceptions about how parallelism is implemented here (or works conceptually in general). My setup is a single CPU with 4 connected GPUs, if that is relevant.
Thanks!