How can I improve the efficiency of _DataLoaderIter

Currently my training code is similar to the following:

loader = DataLoader(dataset, num_workers=50)
for epoch in range(num_epochs):
    for i, (imgs, targets) in enumerate(loader):
        ...

My inner loop takes about 4 minutes to finish, then it takes about another 1 minute to construct a new _DataLoaderIter from my loader for the next epoch. I’ve been trying to speed up the data loader but I haven’t found a working solution yet. So far I’ve tried:

loader = DataLoader(dataset, num_workers=50)
iterator = iter(loader)
for epoch in range(num_epochs):
    for i, (imgs, targets) in enumerate(iter):
        ...

and

loader = DataLoader(dataset, num_workers=50)
iterator = iter(loader)
for epoch in range(num_epochs):
    for i, (imgs, targets) in enumerate(iter):
        ...
    iter.__reset__()

I’m not concerned about shuffling my data, so running through the same data should be sufficient, since my dataset will perform new augmentation for each image anyways. So I really just need to perform a soft reset on the dataloader and start the inner loop again as fast as possible. Any suggestions?

Are you reading the data in the __init__ method of your Dataset?
If the recreation of your DataLoader takes so long it might actually point to your Dataset.
Could you post the code of your Dataset so that we can have a look?