After a dataloader has iterated all data, it can move on to the next iteration without reinitialising. I’ve seen code like this.
dataloader = DataLoader()
for epoch in range(10):
for data, label in dataloader:
train(data, label)
I observed that after each round of dataloader finished loading data, a new round took a very long time to start. So I wrote a program to verify this process.
dataloader = getDataProvider() # length = 100 batch_size = 20
iter = dataloader.__iter__()
for i in range(10):
start = time.time()
try:
data, label = iter.next()
except StopIteration:
print('StopIteration')
iter = data.__iter__()
data, label = iter.next()
end = time.time()
print(i, ' takes time ', end - start)
The cycle i=5 in which the StopIteration exception occurred took 27 seconds. All other cycles take no more than 1 second. Even reinitialising a dataloader takes less than 1 second.
I know that dataloader does some reset function after a StopIteration exception, but why is it more time consuming than reinitialising a dataloader? What is the correct way to reuse a dataloader?