I got a question about how num_workers works during training.
Assume that batch size = 1, num_workers = 2.
In my opinion, the process should be:
worker1: 1 mini-batch (1 sample)
worker2: 1 min-batch (1 sample)
So, in the beginning, two workers should fetch 2 samples for initialization.
But, when I try to prove it by coding, the result is different.
I saw that two workers fetched 4 samples at first for initialization, as shown below.
class MyDataset(Dataset):
def __init__(self):
self.data = np.arange(20)
def __getitem__(self, index):
print("get index{}: ".format(index), self.data[index])
return self.data[index]
def __len__(self):
return 20
if __name__ == '__main__':
train_dataset = MyDataset()
train_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=2)
for _, data in enumerate(train_loader):
print("training: ", data)
The output is:
get index0: 0
get index1: 1
get index2: 2
get index3: 3
training: tensor([0])
training: get index4: 4tensor([1])
get index5: 5
training: get index6: 6
tensor([2])
training: tensor([3])
get index7: 7
training: get index8: 8
tensor([4])
training: get index9: 9
tensor([5])
training: get index10: tensor([6])
10
training: tensor([7])get index11: 11
training: get index12: 12
tensor([8])
training: get index13: 13
tensor([9])
training: tensor([10])get index14:
14
training: get index15: 15
tensor([11])
training: get index16: tensor([12])
16
training: tensor([13])
get index17: 17
training: get index18: 18
tensor([14])
training: tensor([15])
get index19: 19
training: tensor([16])
training: tensor([17])
training: tensor([18])
training: tensor([19])
Can you explain this phenomenon to me?
Thanks!