How do num_workers>1 and batch_size=None interact?

dasturge · November 22, 2022, 7:39pm

I’m curious how the dataloader behaves differently if it’s instructed to use multiple workers but also told that the dataset will take care of batching. Does this mean each worker will load separate batch?

ptrblck · November 22, 2022, 9:12pm

I would expect to see the same behavior, i.e. each worker is responsible to load, process, and create a single batch. If your Dataset is now responsible to create the full batch already, I guess the number of calls into Dataset.__getitem__ would differ:

In the default use case each worker would call batch_size times into __getitem__ to load and process each sample. Afterwards, it would create the batch in its collate_fn.
In your use case I assume each workers calls once into __getitem__ and uses the collate_fn to create/process the batch samples afterwards.