How to Save DataLoader?

Hi,

I am new to PyTorch and currently experimenting on PyTorch’s DataLoader on Google Colab. My experiment often requires training time over 12 hours, which is more than what Google Colab offers. Due to this reason, I need to be able to save my optimizer, learning rate scheduler, and the state per specific epoch checkpoint (e.g., every epoch of multitude 5).

I also need to save the data loaded per mini-batch, of which the size is either 32 or 64. However, I could not save the current state of the DataLoader to be reused for other training epochs. How can I save the current state of DataLoader? Thanks in advance.

Note: May be linked with this issue on PyTorch’s Github.

Can’t we use random seed and get the same list ? Then, knowing the index will suffice.

Maybe

import random
from torch.utils.data import Sampler, DataLoader

class MySampler(Sampler):
    def __init__(self, data_source):
        self.seq = list(range(len(data_source)))
    def __iter__(self):
        return iter(self.seq)

dataset = LoadYourDataset()
sampler = MySampler(dataset)
dataloader = DataLoader(dataset, shuffle=False, sampler=sampler)

for epoch in range(0, 999):
    random.shuffle(dataloader.sampler.seq)
    for i, (x, y) in enumerate(dataloader):
        # some code.
        # save i and dataloader.sampler.seq

I cannot promise that this code will work (just an idea).

Thank you for the answers. After trying some codes of my own yesterday, I figured out that DataLoader can be saved directly using PyTorch’s torch.save(dataloader_obj, 'dataloader.pth'). The order of data is maintained so far, and the batches as well.

1 Like