Data Loader First Batch from each epoch is slow

Can someone explain why every first batch from each is slow?
For example, if I run with following setting on cifar dataset

	trainloader = DataLoader(training_set, batch_size = 3, 
			shuffle = True, num_workers = 3)

The output I get was

Epoch : 1
----------------------------
Response Time : 0.10336828231811523 , worker : 1
Response Time : 0.03924226760864258 , worker : 1
Response Time : 0.052294015884399414 , worker : 1
Response Time : 0.04296731948852539 , worker : 1
Response Time : 0.04889869689941406 , worker : 1
Response Time : 0.05538439750671387 , worker : 1
Response Time : 0.11520791053771973 , worker : 0
Response Time : 0.054024457931518555 , worker : 0
Response Time : 0.04679131507873535 , worker : 0
Response Time : 0.05366873741149902 , worker : 0
Response Time : 0.046289920806884766 , worker : 0
Response Time : 0.052154541015625 , worker : 0
Response Time : 0.11386680603027344 , worker : 2
Response Time : 0.05026078224182129 , worker : 2
Response Time : 0.042822837829589844 , worker : 2
Response Time : 0.053331613540649414 , worker : 2
Response Time : 0.0486750602722168 , worker : 2
Response Time : 0.059424638748168945 , worker : 2
Train: Time taken to load batch 1 is 0.41985082626342773
Train: Time taken to load batch 2 is 0.0005145072937011719
Train: Time taken to load batch 3 is 0.0005793571472167969
Epoch : 1 , Total Time Taken : 0.42101407051086426
Epoch : 2
----------------------------
Response Time : 0.07382702827453613 , worker : 2
Response Time : 0.0440211296081543 , worker : 2
Response Time : 0.038907766342163086 , worker : 2
Response Time : 0.04715609550476074 , worker : 2
Response Time : 0.039115190505981445 , worker : 2
Response Time : 0.0476069450378418 , worker : 2
Response Time : 0.07493948936462402 , worker : 0
Response Time : 0.03332066535949707 , worker : 0
Response Time : 0.05902433395385742 , worker : 0
Response Time : 0.06337189674377441 , worker : 0
Response Time : 0.03606534004211426 , worker : 0
Response Time : 0.032224416732788086 , worker : 0
Response Time : 0.10121726989746094 , worker : 1
Response Time : 0.03180074691772461 , worker : 1
Response Time : 0.0492551326751709 , worker : 1
Response Time : 0.04465985298156738 , worker : 1
Response Time : 0.03739476203918457 , worker : 1
Response Time : 0.044376373291015625 , worker : 1
Train: Time taken to load batch 1 is 0.3801271915435791
Train: Time taken to load batch 2 is 0.009565353393554688
Train: Time taken to load batch 3 is 0.0005276203155517578
Epoch : 2 , Total Time Taken : 0.3902895450592041
Epoch : 3
----------------------------
Response Time : 0.053179264068603516 , worker : 2
Response Time : 0.0395512580871582 , worker : 2
Response Time : 0.033089399337768555 , worker : 2
Response Time : 0.04987502098083496 , worker : 2
Response Time : 0.0494382381439209 , worker : 2
Response Time : 0.09168267250061035 , worker : 2
Response Time : 0.06848335266113281 , worker : 0
Response Time : 0.06932640075683594 , worker : 0
Response Time : 0.09342813491821289 , worker : 0
Response Time : 0.07962489128112793 , worker : 0
Response Time : 0.0555117130279541 , worker : 0
Response Time : 0.03752279281616211 , worker : 0
Response Time : 0.05451393127441406 , worker : 1
Response Time : 0.07509756088256836 , worker : 1
Response Time : 0.10676288604736328 , worker : 1
Response Time : 0.08155059814453125 , worker : 1
Response Time : 0.05485701560974121 , worker : 1
Response Time : 0.036406517028808594 , worker : 1
Train: Time taken to load batch 1 is 0.44634461402893066
Train: Time taken to load batch 2 is 0.0030670166015625
Train: Time taken to load batch 3 is 0.0004851818084716797
Epoch : 3 , Total Time Taken : 0.44997358322143555

From the output, shouldn’t each worker loads three samples? Why do each worker load 6 samples?
I’m very confused. Any help is appreciated

1 Like

The DataLoader iterator is recreated in each epoch and with it all workers, which will all start the prefetching from the beginning.
Here is a potential workaround for this behavior.

Thanks it does seem a bit faster now. Unrelated to the topic, hope could i ask a different question. I tried passing a list of indices to the dataset however the __getitem__ in the cifar10/coco dataset returns a Type Error list indices must be integers or slices, not list when running the code. The list of indices is from BatchSampler. Any reason why?

I think torchvision.datasets.CIFAR10 is not able to accept a list of indices out of the box, as this code would try to apply the Image.fromarray method as well as the transformations on a list of images and should yield an error.
You could extend this class as a custom CIFAR10 dataset and apply these methods in a loop for each loaded sample.

So something like this would be fine? Would it make the data loader to be slower than normal?


def __init(self,data):
        # DO STUFF
 		batch_list = [[],[]]
		c_batches = []

def __getitem__(self,index):
        for data in index:
			result = imagenet_data[data]
			batch_list[0].append(result[0])
			batch_list[1].append(result[1])
		c_batches.append(batch_list)

        return c_batches #<------ This contain a batch

I don’t think you would see a significant slowdown, as currently each worker is creating a complete batch anyway.
However, a quick profiling would be interesting to see nevertheless.