Iterating batch in DataLoader takes more time after every nth iteration

My data is both image and the compressed(.npz) files that contain 3d points and labels for each 3d points(contains a minimum of 646464 points and each file size varies from 10MB to 45MB). num_workers=4, batch_size=32, points_sampled for every batch is 2048.

Total number of parameters: 13414465

train batch 1 load time 21.43259024620056s
train batch 2 load time 0.031423091888427734s
train batch 3 load time 0.004406452178955078
train batch 4 load time 0.004347562789916992
train batch 5 load time 18.13344931602478
train batch 6 load time 0.004399538040161133
train batch 7 load time 0.03353142738342285
train batch 8 load time 0.004467010498046875
train batch 9 load time 16.202253103256226
train batch 10 load time 0.8377358913421631

I understand that input file is too large and I load it from a network location, this could cause the slowdown.
Questions are…

  1. Why is it slow only after #num_workers iteration?
  2. How can we make it faster loading?


  1. Each worker preloads a complete batch. If all start at the same time, they might finish close to each other and in your case 4 batches might be ready. Your actual model workload seems to be small in comparison to the data loading so that the training using these 4 batches finishes quickly. Meanwhile the workers already started to load the new batches, but cannot keep up with the model training so you have clearly a data loading bottleneck in your code.

  2. Have a look at this post for a general explanation and some advice.

1 Like

Thank you @ptrblck for the answer. Does this mean that using maximum possible num_workers is efficient? assuming some data loading bottleneck remains.

I think you would get the best performance using a “sane” number of workers, which depend on your overall system. E.g. 4 workers might work well for local workstations, but might not be enough for bigger servers.

1 Like

Experimented and understood, thank you!

the default value of prefetch_factor is 2, why the iteration is num_workers instead of prefetch_factor * num_workers.i’m confused

The prefetch_factor doesn’t accelerate the data loading so if the workers are too slow and each worker can finish to load a single batch, the slowdown will happen after these 4 batches are consumed as described in the previous post.

thank you for your answer, In the data loader source code, prefetch_ factor acts on _reset function.I increased the prefetch_factor, resulting in more obvious periodicity. As you say, prefetch_factor cannot accelerate the data loading then what is it real role.

It’s role is to create a “buffer” by adding samples to a queue so that the training doesn’t stall assuming:

  • the data loading is fast enough and
  • the model training workload is large enough compared to the data loading.

It seems the previous example might still be unclear so try to imagine the timeline if your workload uses:

  • 10 seconds to load a single batch
  • 0.25s second to train the model (forward/backward/step) using a single batch.

In this case, the initial “warmup” time will take ~10s to allow each worker to load a single batch.
The queue will now be filled with 4 ready batches. The workers will immediately start loading the next batch (as the prefetching kicks in) which will take another ~10s.
The model finishes the training using all 4 batches in 1s (4x0.25s) and has to wait for the next data to be ready. Since that’s not the case, you will see a stall in the training regardless how high the prefetch_factor is set.

a good example helps i understand your what you mean.
but i am still confused that prefetch_factor’s action position.
the experiments’ factor: num_process =16, prefetch_factor =2
the first 32 batches took less time, i think prefetch_factor works,but after that the iteration is 16 = num_workers
so im confused that it just work in the initial.Or it works in the whole training, i understand in the wrong way.

Sorry, the image is unfortunately too small and I’m unsure what exactly it shows. Could you upload a larger version somewhere?

xlabel batch_index
ylabel dataloader_itertime