Is there an advantage to DataLoader num_workers > 1 if the data is first loaded onto each CPU?

As the question asks, if I’m loading data directly onto CPU, and my Dataset’s __getitem__ only indexes an initialized array — after having loaded it from disk in the __init__ method — is there still an advantage to using more than one worker with num_workers > 1?

I would claim it depends on your system, but would not expect to see a huge speedup if at all. Using too many workers could also slow down your data loading due to the simple indexing operation and the overhead from multiprocessing. In the end, you should profile different approaches for your system and use case, and check which config works best.