Very simple use case.
Some data pre-loaded in memory. I just fetch each sample directly from memory
from torch.utils.data import Dataset, DataLoader
import numpy as np
def __init__(self, n, m):
self.X = np.random.rand(n, m)
self.Y = np.random.rand(n, m)
def __getitem__(self, index):
return self.X[index], self.Y[index]
I create such a dataset with 100,000 samples of size 1000
ds = NpDataset(100_000, 1000)
then if I iterate over it using dataloader, using multiple workers does not improve the speed of the iteration:
loader = DataLoader(ds, batch_size=100, num_workers=4, shuffle=True, pin_memory=False)
for x, y in loader:
num_workers=0 it takes only 2s.
Any idea why in such a case multiple workers does not improve the speed?
Selecting a num_workers is pretty tricky and as I migrated slowly to pytorchLightining it gives you a warning with suitable number of num_workers depending on your hardware and data. But in pytorch I think as of now it’s a trail and error.
For me, increasing num_workers reduces data loading per batch, but also occasionally slows down so much that e.g. per 100 batches, it is slower than when num_workers=0. I haven’t figured out what can cause these hiccups.
I’ve been also experiencing the same issue for a while. I am not even sure I ever truly benefited using multiple-workers since I noticed this problem rather late. I have 8 cores and I have tried running with different number of workers 0, 1, 2, …, 8. The main thread (0 workers) gave me the fastest loading consistently. This is also the case for the data that is pre-loaded in the memory.
“pytorch version 1.8.1”