Using DistributedDataParallel with dataloader num_workers > 0

gtbinf22 · September 20, 2022, 9:45pm

Due to the setup of my Dataset class and the size of the data, I need to implement num_workers > 0 for the data loading to run efficiently while training.

But I want to further speed up training. I also have 4 Tesla V100 GPUs available. I tried to implement DistributedDataParallel with num_workers > 0 for the dataloader, but it caused my virtual machine to crash.

Is there any suggested way to do this? If not, how can I speed up data loading AND model training simultaneously? My code is running on 24 CPUs (for data loading) and 1 GPU (for training), with 5 seconds per epoch. Ideally, I’d like to try to speed it up by 4-5x.

Thanks in advance.

@ptrblck

ptrblck · September 21, 2022, 3:46am

Yes, multiple workers in DataLoaders and DDP are compatible and commonly used. I would recommend to try to debug the issue you are seeing if num_workers>0 is set and maybe try to run a few reference codes to further isolate the issue. E.g. the official ImageNet example can use DDP and multiple workers and might be a good baseline.