A question about the delay of DataLoader when re-iterate a dataset during training

Hello everyone,

I found a phenomenon when training a network model. When the iteration of dataset is finished in a epoch, the DataLoader spends about 10s before re-iterating the dataset for the next epoch. This may mean a lot for some models trained on small dataset.
I defined a lot of data augmentation method in the initialization of Dateset, and I set pin_memory to be True, using 8 num_workers.
I did a simple experiment to measure the time needed for initializing Dataset and DataLoader, but the mean time is just about 0.1~0.15s.
Do you have any idea about this question? Thank you.

In each epoch the workers will recreate a batch, so the first iteration might be slower.
Especially if you are using a lot of workers with a large batch size.
IIRC there is an open PR to use multiple workers to create a single batch, which would speed up the first iteration.

Thanks for your reply, but what is IIRC and the open PR, I have no idea about that:joy:

Oh sorry for the abbreviations. :slight_smile:
IIRC - If I recall/remember correctly
PR - Pull request

In fact, it’s still in the discussion stage, e.g. this issue.

OK, I will try it, Thank you:grinning:

@ptrblck why does the loading time for first batch increase on increasing the number of worker threads? I am assuming workers run in parallel so ideally the loading time for first batch should not increase significantly with more workers (some increase might be due to overhead of multi-threading). But I see that the time to load first batch increases almost linearly with numbers of workers

Multiple workers will all load a batch at the same time so you might run into a bottleneck from reading from your SSD or your CPU might create the bottleneck.

NIT: note that multiprocessing is used not multi-threading.

Why does the bottleneck not affect the later batches? I am sorry this still is not clear to me.

After the first iteration the workers should have added the loaded batches to the queue, while all workers need to create a new batch in the very first iteration.
As explained before, the more workers you are using the more pressure you are putting on your system, which won’t scale after a system-specific number.