Calls to enumerate(dataloader) are quite slow in my project. I’m loading images stored in LMDB format, and I have multiple LMDBs that I call ConcatDataset on to create the final dataset. I noticed that reducing the num_workers from say 6 to 1 or 2 reduces the proportion of time spent on enumerate(), though that slows the loading of images. I was previously passing in a WeightedRandomSampler() into the dataloader and shuffle=False, but when I tried shuffle=True and sampler=None, that also reduced the amount of slowness around enumerate(). Despite these changes, about 20-30% of the training time goes to waiting for enumerate(dataloader) to start each epoch. My code looks like this:
for epoch in range(start_epoch, total_epochs+1): for _,train_data in enumerate(train_loader): #do work
I have confirmed that the hanging/slowness is due to enumerate and not the work being done during each epoch. Looking at the documentation (https://pytorch.org/docs/stable/data.html) I can see that the slowness is likely due to a large amount of work being done in each call of enumerate():
In this mode, each time an iterator of a
DataLoaderis created (e.g., when you call
num_workersworker processes are created. At this point, the
worker_init_fnare passed to each worker, where they are used to initialize, and fetch data. This means that dataset access together with its internal IO, transforms (including
collate_fn) runs in the worker process. … Workers are shut down once the end of the iteration is reached, or when the iterator becomes garbage collected.
My question is, what can we do to eliminate the slowness around enumerate? Is it possible to keep those processes generated in the first call to enumerate and pass them a different shuffling of the data? Or any other ideas?
Other info: I’m using the Apex mixed precision package, training on V100 GPUs, and using torch.distributed.DataParallel() on 4 GPUs. Trying apex.parallel.DistributedDataParallel did not significantly reduce the slowness. The hanging is still significant when training on one or two GPUs.