I seem to encounter the following behavior for DataLoader:
- If I set num_workers > 0, a lot of iterations seem to take very fast to load (perhaps due to the caching) but then every X iterations it will sort of have a glitch and take quite some time to load (perhaps due to the preprocessing).
- Meanwhile, if I set num_workers = 0, all iterations seem to take slightly longer to load but the time is consistent.
For example, to put some benchmark (avg. data loading time every 100 iterations):
- num_workers = 8:
- up to 800-th step: avg. 0.0004s / step
- 800-900th step: avg. 0.65s / step (note that the glitch prob happen in only one of the steps)
- num_workers = 0:
- avg. 0.015s / step
If we do some math for 900 steps:
- num_workers = 8: 0.0004 * 800 + 0.65 * 100 = 0.32 + 65 = 65.32s
- num_workers = 0: 0.015 * 900 = 13.5s
I’m aware that the huge spike in time is probably because it tries to cache for future iterations, but still the difference in time is quite significant and the inconsistency is quite surprising to me.
Is this kind of behavior expected?