I found a phenomenon when training a network model. When the iteration of dataset is finished in a epoch, the DataLoader spends about 10s before re-iterating the dataset for the next epoch. This may mean a lot for some models trained on small dataset.
I defined a lot of data augmentation method in the initialization of Dateset, and I set pin_memory to be True, using 8 num_workers.
I did a simple experiment to measure the time needed for initializing Dataset and DataLoader, but the mean time is just about 0.1~0.15s.
Do you have any idea about this question? Thank you.
In each epoch the workers will recreate a batch, so the first iteration might be slower.
Especially if you are using a lot of workers with a large batch size.
IIRC there is an open PR to use multiple workers to create a single batch, which would speed up the first iteration.
@ptrblck why does the loading time for first batch increase on increasing the number of worker threads? I am assuming workers run in parallel so ideally the loading time for first batch should not increase significantly with more workers (some increase might be due to overhead of multi-threading). But I see that the time to load first batch increases almost linearly with numbers of workers
Multiple workers will all load a batch at the same time so you might run into a bottleneck from reading from your SSD or your CPU might create the bottleneck.
NIT: note that multiprocessing is used not multi-threading.
After the first iteration the workers should have added the loaded batches to the queue, while all workers need to create a new batch in the very first iteration.
As explained before, the more workers you are using the more pressure you are putting on your system, which won’t scale after a system-specific number.