Dataloader num_workers> batch_size

Hi,
I’ve used print() inside the dataloader. For my great surprise I was using num_workers = 10 and batch_size=1 and I saw the dataloader run 10 times. It returned only one sample, but run 10 times.

In short, all those loaded samples were lost? the idx of dataloader considered them as already taken?

Why don’t you fix the amount of workers to be =< than the batch size in the source?

Each worker will load a complete batch, not a single sample.
In your example, the following 9 batches were also pre-loaded by the other workers.

If I recall correctly, there was a discussion about loading a single batch using multiple workers, but I’m not sure about the status of this feature request.

1 Like