Hi, I am trying to use the Data Loader’s in-built num_workers to parallelize my batch processing but I can’t see any significant gains by using Data Loader’s inbuilt num_workers. I did a small toy example for that where I am just trying to see the effect of num_workers on the runtime of my code. Here is my Code
import numpy as np
import time
import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoadernum_workers = 0
T = 10
num_samples = 500000
num_features = 100
batch_size = 1024
num_epochs = 20X = np.random.uniform(size=(num_samples,T,num_features))
y = np.random.uniform(size=(num_samples))X = torch.Tensor(X)
y = torch.Tensor(y)dataset = TensorDataset(X,y)
dataloader = DataLoader(dataset, batch_size=batch_size,shuffle=True,num_workers=num_workers)
tic = time.clock()
for epoch in range(num_epochs):
for batch_idx, (x,target) in enumerate(dataloader):
continue
print(“==> Epoch:”,epoch)toc = time.clock()
print()
print(“Run Time:”,toc-tic)
Now I ran this code, with various configurations of num_workers,batch size and computed the run time:
Batch Size = 128, Num_workers = 0 , Run Time = 26.278781
Batch Size = 128, Num_workers = 2, Run Time = 44.073031
Batch Size = 128, Num_workers = 4, Run Time = 45.135034
Batch Size = 128, Num_workers = 128,Run Time = 102.223168Batch Size = 256, Num_workers = 0 , Run Time = 28.837365
Batch Size = 256, Num_workers = 2, Run Time = 28.169192
Batch Size = 256, Num_workers = 4, Run Time = 29.175953
Batch Size = 256, Num_workers = 128,Run Time = 85.561222Batch Size = 1024, Num_workers = 0 , Run Time = 35.239104
Batch Size = 1024, Num_workers = 2, Run Time = 14.877822
Batch Size = 1024, Num_workers = 4, Run Time = 17.189713
Batch Size = 1024, Num_workers = 128,Run Time = 73.567457
Can someone explain how num_workers effects the runtime? Ideally increasing the num_workers should decrease the data loading time. But why is the run time increasing in some cases and decreasing in other cases?