I have two datasets, each one having 60k RGB images (320x320) saved as .pt files, and two DataLoaders (one for each dataset). Both DataLoaders have equal parameters: batch size=256, num_workers = 6, shuffle = True, drop_last=True. At each epoch, I get batches from the DataLoaders as in this pseudocode:
data_generator1 = data.DataLoader(...) data_generator2 = data.DataLoader(...) num_batches = len(data_generator1) # note: generators have same len # tools for measuring execution times start_batch1 = torch.cuda.Event(enable_timing=True) end_batch1 = torch.cuda.Event(enable_timing=True) start_batch2 = torch.cuda.Event(enable_timing=True) end_batch2 = torch.cuda.Event(enable_timing=True) for epoch in range(epochs): t_batch1 =  t_batch2 =  generator_iterator1 = iter(data_generator1) generator_iterator2 = iter(data_generator2) for i in range(num_batches): try: # Get a batch from both dataloaders start_batch1.record() batch1, labels1 = next(generator_iterator1) end_batch1.record() torch.cuda.synchronize() t_batch1.append(start_batch1.elapsed_time(end_batch1)) start_batch2.record() batch2, labels2 = next(generator_iterator2) end_batch2.record() torch.cuda.synchronize() t_batch2.append(start_batch2.elapsed_time(end_batch2)) except StopIteration: # dataloaders are empty break # Train model.... print("Avg time batch1: ", sum(t_batch1)/len(t_batch1)) print("Avg time batch2: ", sum(t_batch2)/len(t_batch2))
Now, the problem is that
batch1, labels1 = next(generator_iterator1) on average is 5x slower than
batch2, labels2 = next(generator_iterator2), i.e. 1500 millseconds vs 300 milliseconds (with 60000/256 = 234 batches in total, this results in 5.85 minutes vs 1.17 minutes per epoch). Of course, I would like both to take the minimum amount of time in order to speed up the training process. At first I thought the problem was on the data (maybe dataset1 is “heavier” than dataset2, or maybe I saved them in the .pt files as different types of data). However, I swapped the commands order in the code by placing
batch2, labels2 = next(generator_iterator2) first and then
batch1, labels1 = next(generator_iterator1). And guess what? Now
batch2, labels2 = next(generator_iterator2) is 5x slower. So it’s clearly not the kind of data, but rather which DataLoader is fetched first. In other words, no matter which dataloader I fetch, the first dataloader to be fetched is always the slowest to return the batches. Does anybody know why this is happening?
EDIT: I forgot to say that I am using pytorch on Ubuntu.