Restart a dataloader iteration from where I stopped last time?

I implemented a torch.utils.data.Dataset and use torch.utils.data.DataLoader to iterate through the dataset. The dataset is huge, and I may not be able to finish one round of iteration in a single experiment. My question is, how do I save the state of the current dataloader so the next time I can resume from where I was, instead of starting from the beginning of the iteration.

Or, if I simply use RandomSampler, will I get a different batch to start with next time? If I don’t set the random seed.

If you don’t terminate the program, you could just get the data from the dataloader using dat = next(dl) (python 3 syntax) instead of a for loop.

If you do have separate programs it might be more straightforward to introduce a stable splitting of the dataset or something like that (it probably is also good for reproducibility).

Best regards

Thomas

1 Like