DataLoader slower with num_workers > 0

Very simple use case.

Some data pre-loaded in memory. I just fetch each sample directly from memory

from import Dataset, DataLoader
import numpy as np

class NpDataset(Dataset):
    def __init__(self, n, m):
        self.X = np.random.rand(n, m)
        self.Y = np.random.rand(n, m)
    def __getitem__(self, index):
        return self.X[index], self.Y[index]
    def __len__(self):
        return self.X.shape[0]

I create such a dataset with 100,000 samples of size 1000
ds = NpDataset(100_000, 1000)

then if I iterate over it using dataloader, using multiple workers does not improve the speed of the iteration:

loader = DataLoader(ds, batch_size=100, num_workers=4, shuffle=True, pin_memory=False)
for x, y in loader:

takes 3.7s
while with num_workers=0 it takes only 2s.

Any idea why in such a case multiple workers does not improve the speed?


Selecting a num_workers is pretty tricky and as I migrated slowly to pytorchLightining it gives you a warning with suitable number of num_workers depending on your hardware and data. But in pytorch I think as of now it’s a trail and error.

For me, increasing num_workers reduces data loading per batch, but also occasionally slows down so much that e.g. per 100 batches, it is slower than when num_workers=0. I haven’t figured out what can cause these hiccups.

I’ve been also experiencing the same issue for a while. I am not even sure I ever truly benefited using multiple-workers since I noticed this problem rather late. I have 8 cores and I have tried running with different number of workers 0, 1, 2, …, 8. The main thread (0 workers) gave me the fastest loading consistently. This is also the case for the data that is pre-loaded in the memory.

“pytorch version 1.8.1”


Facing the same issue.

any explanation on this? I’m experiencing the same issue

Same issue here using PyTorch 1.12.0. PytorchLightning throws a PossibleUserWarning and suggests to use 8 workers (which is the number of cores in my M1 CPU), but doing so results in a huge slow down.


Also noticing this on MacOS. PyTorch 1.13; PL 1.18

Also experiencing this on M1 Pro machine. Pytorch 2.0.1, Pytorch Lightning 2.0.3.

Same behaviour on Windows 10 Pro, Pytorch Lightning 2.0.4, Torch 2.0.0+cu117
Got 20 cores and putting num_workers on 20 causes a slowdown of several minutes between each epoch. Putting num_workers on 1 or 2 does already lead to a much better result with a slowdown of 20 seconds. With num_workers at 0 I’ve received the by far best results with a slowdown of maybe 2-3 seconds at most.

Same on Ubuntu 20.04.5 LTS, using pytorch 2.1.0 and lightning 2.1.0. Do we have any updates on this? Is there any guideline as to when we should set num_workers > 0?

1 Like