Multi workers specified by num_workers load samples to form a batch, or each worker load a batch respectively in DataLoader?

I mean, supposed batch_size=4 and num_workers=2, which of the following may match the runtime case?

#1
worker1: load sample1, sample2
worker2: load sample3, sample4
batch1 contains sample1, sample2, sample3, sample4

#2
worker1: load batch1
worker2: load batch2

Apparently, it would be case #2. See this answer.

I have read the mentioned post, but how can I proof it? or any PyTorch source code show the fact?

The source code for the DataLoader logic is too complex for me, but I see two reasons why #2 would be the approach used in practice:

  1. @apaszke said it,
  2. creating a batch from the two workers would require waiting for them to finish their task (getting data) before returning the batch, which is inefficient and (I guess depending on the architecture, data structure, storage etc…) useless compared to a single process loading sample1, sample2, sample3 and sample4 and then returning the batch.

What is your use case?

for your first point, @apaszke only showed the result, but I want to know the reason.
for the second, multi workers handle a batch may not be slower than single worker.