Different batches take different times to load

Pramodith · June 18, 2018, 7:22pm

I’m using the snippet given below to measure the amount of time each batch takes to load.

st_batch= time.time()
torch.cuda.synchronize()
                
for cnt, batch in enumerate(train_data_loader):
            torch.cuda.synchronize()
            end_batch= time.time()
           print(end_batch-st_batch)
          # do something
           st_batch=time.time()

I observe that some batches take 3-5 seconds whereas others take just 0.002s

I assume this is a bottleneck while training my network is there anyway around this?

ptrblck · June 20, 2018, 6:05am

Are you using multiple workers in your DataLoader?
If so, could you try to increase the number?
It looks like your training procedure has to wait sometimes for the DataLoader to provide new batches. Also, is your data locally on an SSD?

Pramodith · June 20, 2018, 7:50am

There was an overhead in my getitem function , I fixed it and it works fine now.

ammary-mo · April 4, 2021, 2:50pm

Hi @Pramodith , can you share what the overhead in your getitem was?