Question about the num_workers in DataLoader

I got a question about how num_workers works during training.

Assume that batch size = 1, num_workers = 2.
In my opinion, the process should be:
worker1: 1 mini-batch (1 sample)
worker2: 1 min-batch (1 sample)
So, in the beginning, two workers should fetch 2 samples for initialization.

But, when I try to prove it by coding, the result is different.
I saw that two workers fetched 4 samples at first for initialization, as shown below.

class MyDataset(Dataset):
    def __init__(self): = np.arange(20)

    def __getitem__(self, index):
        print("get index{}: ".format(index),[index])

    def __len__(self):
        return 20

if __name__ == '__main__':
    train_dataset = MyDataset()
    train_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=2)

    for _, data in enumerate(train_loader):
        print("training: ", data)

The output is:

get index0:  0
get index1:  1
get index2:  2
get index3:  3
training:  tensor([0])
training:  get index4:  4tensor([1])

get index5:  5
training:  get index6:  6
training:  tensor([3])
get index7:  7
training:  get index8:  8
training:  get index9:  9
training:  get index10: tensor([6]) 
training:  tensor([7])get index11:  11

training:  get index12:  12
training:  get index13:  13
training:  tensor([10])get index14: 
training:  get index15:  15
training:  get index16: tensor([12]) 
training:  tensor([13])
get index17:  17
training:  get index18:  18
training:  tensor([15])
get index19:  19
training:  tensor([16])
training:  tensor([17])
training:  tensor([18])
training:  tensor([19])

Can you explain this phenomenon to me?

The DataLoader is prefetching samples, which can be set by prefetch_factor as described in the docs.
The default value is set to 2 * num_workers so 4 in your case, which explains the loading of 4 samples at the beginning.