DataLoader worker thread synchronization

I have a general query about how the DataLoader distributes work and synchronises it across the different worker threads that are launched using the num_workers argument.

Do the threads join at the end of each minibatch processing(i.e., at the default/user-defined collate_fn), or each thread is non-blocking and keeps on processing it’s share of data across multiple minibatches, writing the processed data into the main memory, and the iterator just picks up the data from the memory as all the batch elements are available.

Kindly help me understand the workflow, would be much appreciated.
Thanks.

DataLoader uses process instead of thread. After a batch is prepared by a worker, the batch is put into a multiprocessing_context.Queue() from which the main process will read and return the data.