Using DistributedDataParallel with dataloader num_workers > 0

Due to the setup of my Dataset class and the size of the data, I need to implement num_workers > 0 for the data loading to run efficiently while training.

But I want to further speed up training. I also have 4 Tesla V100 GPUs available. I tried to implement DistributedDataParallel with num_workers > 0 for the dataloader, but it caused my virtual machine to crash.

Is there any suggested way to do this? If not, how can I speed up data loading AND model training simultaneously? My code is running on 24 CPUs (for data loading) and 1 GPU (for training), with 5 seconds per epoch. Ideally, Iā€™d like to try to speed it up by 4-5x.

Thanks in advance.


Yes, multiple workers in DataLoaders and DDP are compatible and commonly used. I would recommend to try to debug the issue you are seeing if num_workers>0 is set and maybe try to run a few reference codes to further isolate the issue. E.g. the official ImageNet example can use DDP and multiple workers and might be a good baseline.

1 Like