After a dataloader has iterated all data, it can move on to the next iteration without reinitialising. I’ve seen code like this.
dataloader = DataLoader() for epoch in range(10): for data, label in dataloader: train(data, label)
I observed that after each round of dataloader finished loading data, a new round took a very long time to start. So I wrote a program to verify this process.
dataloader = getDataProvider() # length = 100 batch_size = 20 iter = dataloader.__iter__() for i in range(10): start = time.time() try: data, label = iter.next() except StopIteration: print('StopIteration') iter = data.__iter__() data, label = iter.next() end = time.time() print(i, ' takes time ', end - start)
The cycle i=5 in which the StopIteration exception occurred took 27 seconds. All other cycles take no more than 1 second. Even reinitialising a dataloader takes less than 1 second.
I know that dataloader does some reset function after a StopIteration exception, but why is it more time consuming than reinitialising a dataloader? What is the correct way to reuse a dataloader?