Hey @ammary-mo, how did you measure the delay? Since DataParallel
and DistributedDataParallel
are only involved in the forward and backward passes, could you please try using elapsed_time
to measure data loading, forward and backward delay breakdowns? See the following discussion. It’s possible that if multiple DDP processes try to read from the same file, contentions might lead to data loading perf regression. If that’s the case, the solution would be implementing a more performant data loder.
cc @VitalyFedyunin @glaringlee for DataLoader and DataSampler.