I mean, supposed batch_size=4 and num_workers=2, which of the following may match the runtime case?
worker1: load sample1, sample2
worker2: load sample3, sample4
batch1 contains sample1, sample2, sample3, sample4
worker1: load batch1
worker2: load batch2
Apparently, it would be case #2. See this answer.
I have read the mentioned post, but how can I proof it? or any PyTorch source code show the fact?
The source code for the DataLoader logic is too complex for me, but I see two reasons why #2 would be the approach used in practice:
@apaszke said it,
- creating a batch from the two workers would require waiting for them to finish their task (getting data) before returning the batch, which is inefficient and (I guess depending on the architecture, data structure, storage etc…) useless compared to a single process loading sample1, sample2, sample3 and sample4 and then returning the batch.
What is your use case?
for your first point, @apaszke only showed the result, but I want to know the reason.
for the second, multi workers handle a batch may not be slower than single worker.
In my test case, I use 200 samples and set batch size is equal to 50. The result shows only 4 subprocess works even if i set num_workers=16(I have 16 CPUs). So I think alex.veuthey is right.