Due to the setup of my Dataset class and the size of the data, I need to implement num_workers > 0 for the data loading to run efficiently while training.
But I want to further speed up training. I also have 4 Tesla V100 GPUs available. I tried to implement DistributedDataParallel with num_workers > 0 for the dataloader, but it caused my virtual machine to crash.
Is there any suggested way to do this? If not, how can I speed up data loading AND model training simultaneously? My code is running on 24 CPUs (for data loading) and 1 GPU (for training), with 5 seconds per epoch. Ideally, Iād like to try to speed it up by 4-5x.
Thanks in advance.