Currently my training code is similar to the following:
loader = DataLoader(dataset, num_workers=50)
for epoch in range(num_epochs):
for i, (imgs, targets) in enumerate(loader):
...
My inner loop takes about 4 minutes to finish, then it takes about another 1 minute to construct a new _DataLoaderIter from my loader for the next epoch. I’ve been trying to speed up the data loader but I haven’t found a working solution yet. So far I’ve tried:
loader = DataLoader(dataset, num_workers=50)
iterator = iter(loader)
for epoch in range(num_epochs):
for i, (imgs, targets) in enumerate(iter):
...
and
loader = DataLoader(dataset, num_workers=50)
iterator = iter(loader)
for epoch in range(num_epochs):
for i, (imgs, targets) in enumerate(iter):
...
iter.__reset__()
I’m not concerned about shuffling my data, so running through the same data should be sufficient, since my dataset will perform new augmentation for each image anyways. So I really just need to perform a soft reset on the dataloader and start the inner loop again as fast as possible. Any suggestions?