DataLoader worker thread synchronization

I have a general query about how the DataLoader distributes work and synchronises it across the different worker threads that are launched using the num_workers argument.

Do the threads join at the end of each minibatch processing(i.e., at the default/user-defined collate_fn), or each thread is non-blocking and keeps on processing it’s share of data across multiple minibatches, writing the processed data into the main memory, and the iterator just picks up the data from the memory as all the batch elements are available.

Kindly help me understand the workflow, would be much appreciated.
Thanks.

DataLoader uses process instead of thread. After a batch is prepared by a worker, the batch is put into a multiprocessing_context.Queue() from which the main process will read and return the data.

I get what you’re saying, but my doubt is this: Given a batch size of 128, and num_workers=128 specified to the dataloader, are individual items of a batch given to each worker or is each worker responsible for a single batch, while the rest of the workers handle their own batch processing.

In short, is parallelisation across workers intra-batch or inter-batch?

I would’ve assume it should be intra-batch for faster batch processing, but I have some experimental results that hint towards inter-batch parallelism. Please clarify this for me.