I’m curious how the dataloader behaves differently if it’s instructed to use multiple workers but also told that the dataset will take care of batching. Does this mean each worker will load separate batch?
I would expect to see the same behavior, i.e. each worker is responsible to load, process, and create a single batch. If your Dataset
is now responsible to create the full batch already, I guess the number of calls into Dataset.__getitem__
would differ:
- In the default use case each worker would call
batch_size
times into__getitem__
to load and process each sample. Afterwards, it would create the batch in itscollate_fn
. - In your use case I assume each workers calls once into
__getitem__
and uses thecollate_fn
to create/process the batch samples afterwards.