How to choose num_worker when using DDP?


I’m using DDP on a machine with 64 vCPUs and 8 GPUs
In what range shall I tune the num_workers of the data loader? since the script will run 8 times in the machine (one process per GPU), is my understanding correct that in order to avoid CPU contention, num_workers should be <= cpu_count / 8 ?

Yes this is correct if your dataloader is CPU intensive, if the dataloader is IO intensive, you could have more num_workers to load data from the filesystem/network with a higher degree of parallelism (until you saturate the filesystem/nw bandwidht).

1 Like