DataLoader glitches every once in a while when num_workers > 0

I seem to encounter the following behavior for DataLoader:

  • If I set num_workers > 0, a lot of iterations seem to take very fast to load (perhaps due to the caching) but then every X iterations it will sort of have a glitch and take quite some time to load (perhaps due to the preprocessing).
  • Meanwhile, if I set num_workers = 0, all iterations seem to take slightly longer to load but the time is consistent.

For example, to put some benchmark (avg. data loading time every 100 iterations):

  • num_workers = 8:
    • up to 800-th step: avg. 0.0004s / step
    • 800-900th step: avg. 0.65s / step (note that the glitch prob happen in only one of the steps)
  • num_workers = 0:
    • avg. 0.015s / step

If we do some math for 900 steps:

  • num_workers = 8: 0.0004 * 800 + 0.65 * 100 = 0.32 + 65 = 65.32s
  • num_workers = 0: 0.015 * 900 = 13.5s

I’m aware that the huge spike in time is probably because it tries to cache for future iterations, but still the difference in time is quite significant and the inconsistency is quite surprising to me.

Is this kind of behavior expected?

If the workers aren’t able to pre-load the batches in time, you would see spikes once in a while (or every num_workers iterations), but your issue sounds a bit strange, since the data loading speed tanks towards the end. Are you seeing the same slowdown in each epoch?

Yeah, it seems to happen consistently every 800 steps upon loading data using the dataloader in my case