Restart a dataloader iteration from where I stopped last time?

I implemented a and use to iterate through the dataset. The dataset is huge, and I may not be able to finish one round of iteration in a single experiment. My question is, how do I save the state of the current dataloader so the next time I can resume from where I was, instead of starting from the beginning of the iteration.

Or, if I simply use RandomSampler, will I get a different batch to start with next time? If I don’t set the random seed.

If you don’t terminate the program, you could just get the data from the dataloader using dat = next(dl) (python 3 syntax) instead of a for loop.

If you do have separate programs it might be more straightforward to introduce a stable splitting of the dataset or something like that (it probably is also good for reproducibility).

Best regards


1 Like