Hi,
I’m using DDP on a machine with 64 vCPUs and 8 GPUs
In what range shall I tune the num_workers of the data loader? since the script will run 8 times in the machine (one process per GPU), is my understanding correct that in order to avoid CPU contention, num_workers should be <= cpu_count / 8 ?