DataLoader prefetches batches for the next epoch after being consumed once?

When I have many small epochs, can it prefetch the first batches for the next epoch? I would assume this could be done if persistent_workers = True

Yes, persistent_workers=True will not shut down all workers after the epoch end and will not restart them for the new epoch and the batches will be continuously loaded.

I guess there is a problem with that given that we often have sampler.set_epoch(epoch) especially in distributed context before the new epoch, so prefetched batches need to be discarded somehow and the sampler needs to be re-evaluated.

That’s a good point and I guess I’m wrong.
Based on this code the reset method is called, which seems to grab the new sampler.
You could test the behavior of the sampler with a pre-defined code snippet and check if persistent workers behave the same or not.

In this case, would there be a way to have the custom sampler that gets the next epoch, without ending the current one?
A naive way would be to just have the sampler to generate indices for all epochs in one go. But it’s very hacky as we don’t have proper knowledge of the current epoch anymore