I have two datasets, each one having 60k RGB images (320x320) saved as .pt files, and two DataLoaders (one for each dataset). Both DataLoaders have equal parameters: batch size=256, num_workers = 6, shuffle = True, drop_last=True. At each epoch, I get batches from the DataLoaders as in this pseudocode:
data_generator1 = data.DataLoader(...)
data_generator2 = data.DataLoader(...)
num_batches = len(data_generator1) # note: generators have same len
# tools for measuring execution times
start_batch1 = torch.cuda.Event(enable_timing=True)
end_batch1 = torch.cuda.Event(enable_timing=True)
start_batch2 = torch.cuda.Event(enable_timing=True)
end_batch2 = torch.cuda.Event(enable_timing=True)
for epoch in range(epochs):
t_batch1 = []
t_batch2 = []
generator_iterator1 = iter(data_generator1)
generator_iterator2 = iter(data_generator2)
for i in range(num_batches):
try:
# Get a batch from both dataloaders
start_batch1.record()
batch1, labels1 = next(generator_iterator1)
end_batch1.record()
torch.cuda.synchronize()
t_batch1.append(start_batch1.elapsed_time(end_batch1))
start_batch2.record()
batch2, labels2 = next(generator_iterator2)
end_batch2.record()
torch.cuda.synchronize()
t_batch2.append(start_batch2.elapsed_time(end_batch2))
except StopIteration:
# dataloaders are empty
break
# Train model....
print("Avg time batch1: ", sum(t_batch1)/len(t_batch1))
print("Avg time batch2: ", sum(t_batch2)/len(t_batch2))
Now, the problem is that batch1, labels1 = next(generator_iterator1)
on average is 5x slower than batch2, labels2 = next(generator_iterator2)
, i.e. 1500 millseconds vs 300 milliseconds (with 60000/256 = 234 batches in total, this results in 5.85 minutes vs 1.17 minutes per epoch). Of course, I would like both to take the minimum amount of time in order to speed up the training process. At first I thought the problem was on the data (maybe dataset1 is “heavier” than dataset2, or maybe I saved them in the .pt files as different types of data). However, I swapped the commands order in the code by placing batch2, labels2 = next(generator_iterator2)
first and then batch1, labels1 = next(generator_iterator1)
. And guess what? Now batch2, labels2 = next(generator_iterator2)
is 5x slower. So it’s clearly not the kind of data, but rather which DataLoader is fetched first. In other words, no matter which dataloader I fetch, the first dataloader to be fetched is always the slowest to return the batches. Does anybody know why this is happening?
EDIT: I forgot to say that I am using pytorch on Ubuntu.