Strange behavior in data loader with workers

Each worker will load a batch in the background. Once its batch is returned, the worker will start loading the next batch.
Since you are seeing a slowdown after num_workers batches, this points towards a data loading bottleneck (or a tiny model).

I.e. the first iteration (batch 0) will take some time to load the first complete batch. Once it’s loaded, it’ll be returned and after the forward/backward pass is done, the next iteration will start.
The next worker(s) seem to have already loaded the data and will return it immediately (batch 1 to 9).
In batch 10, the first worker is supposed to yield the next batch, but since the previous 9 iterations were really fast, you will see the slowdown again until all samples for the batch are loaded again.