How to choose the value of the num_workers of Dataloader

I run models on a machine with 8 core CPU and NVIDIA v100, how should I choose the num_workers to make the data be loaded efficiently.


Unfortunately, there is no absolute true value. It will depend on your setup.
For low number of workers, you should see an improvement whenever you add more of them. At some point, it will stop improving and just keep the same performances even with more workers. You should use the lowest number that gives you good performances.

1 Like

Many thanks! Actually, I’m think there is a bottle neck of my training process and I’m trying to find it out. I am training a 3-layers MLP model but the training is irrationally slow.
Though I did a preprocessing of data in the training loop, I think it won’t influence the training time too much, because the preprocessing should be accelerated by CUDA.

So I think whether the data loader is working efficiently. The data loader load a np.array object in the .getitem method.

The most weird thing is the training seems be slower epoch by epoch, though I use a torch.cuda.empty_cache, it doesn’t help much.
A preload is also implement, so the GPU is not starving.

Oh another question is that is num_workers related with multi-threading or multi-processing?

Yes, num_workers is the total number of processes used in data loading.

I’ve found here the general recommandation of using 4 workers per GPU, and I’ve found that it works really well with my own setup, but that might not be universal… @albanD’s method (adding more until it peaks) is probably the best way to find what works for you.

1 Like

You should not use empty_cache() as it can only slow down your code.
If the GPU is used at 99%, then there isn’t much more you can do I’m afraid.
You can try the bottleneck tool maybe to get more information.