Restore dataloader within an epoch

Paul_Koch · May 29, 2020, 10:05am

Dear Community,

I would like to restore my Dataloader similar to the model/scheduler/optimizer with the “state_dict”. Given a special circumstance I may want to pause my training inside a epoch I would like to restore my training right where it stopped. However, the Dataloader has no state_dict unlike other pytorch classes. I use “islice” from the “itertools” library to continue my training from a given step in the epoch. Unfortunately I can not find any way to set the iter list inside the sampler of the Dataloader. This is not an issue if my data is iterated sequentially. Thou for shuffled iteration this is not the case. Inside the “RandomSampler” function “_iter_(self)” the index list is set like this: iter(torch.randperm(n).tolist()) [1]. If this list would become part of the sampler I would be able to get and set it. My question is whether there is any way which I could not find to restore the iter index list of the sampler? Else you might want to consider and discuss my proposal to make the “torch.randperm(n).tolist()” list a part of the sampler self.

Best,
Paul

[1] https://pytorch.org/docs/stable/_modules/torch/utils/data/sampler.html#Sampler

Paul_Koch · May 29, 2020, 10:32am

If I set a seed with (torch.manual_seed(n)) before I iter my dataloader in every epoch I have ensured that my randomness is the same if i restore the training at a given epoch step. After every epoch i generate a new seed and then I just have to save that seed when I stop my training within an epoch.