I have noticed that my dataloader gets slower if I add more workers compared to num_workers=0.
My dataset definition is quite simple:
class Dataset(torch.utils.data.Dataset): def __init__(self, file_paths, labels, transform=None): self.file_paths = file_paths self.labels = labels self.transform = transform def __len__(self): return len(self.file_paths) def __getitem__(self, index): path_img = self.file_paths[index] image = accimage.Image(path_img) target = self.labels[index] if self.transform is not None: image = self.transform(image) return image, target
file_paths and labels can be quite large (above 1 million entries each). So to my understanding, the dataloader would push the dataset onto a worker every time it is indexed? Could it be slow because this creates an overhead?
If this is indeed the reason for slow performance, how could i work around that? Would it be possible to only parallelize loading and transforming an image where I send a path, the label and the transforms to a worker?
Sorry if this just reveals severe misconceptions about how parallelism is implemented here (or works conceptually in general). My setup is a single CPU with 4 connected GPUs, if that is relevant.