Can i shange current step in dataloader?

hadaev8 · February 2, 2022, 3:31pm

For gan training peoples seems to do it like this:

    trainset = GanLoader(imgs)
    sampler = torch.utils.data.RandomSampler(trainset, replacement=True, num_samples=config['num_samples'])
    train_loader = torch.utils.data.DataLoader(trainset, num_workers=config['num_workers'],
                                               sampler=sampler, batch_size=config['batch_size'],
                                               pin_memory=True, drop_last=True)

For example, I saved training on step 10k and exited program, then resuming train loop, can I set 10k step in dataloader so it would give the exact same sample as it would without breaking execution?

ptrblck · February 3, 2022, 7:05am

No, you most likely wouldn’t get the same DataLoader state if you haven’t used e.g. epoch seeds before.
In your current code snippet you are using a RandomSampler without passing a generator, so a new one will be initialized as seen here.

hadaev8 · February 11, 2022, 12:01pm

Well, this seed should depend on global seed, can I skip first n steps in generator?
Would set_state help?

Maybe I should save the generator/dataloader at the checkpoint?

ptrblck · February 11, 2022, 4:35pm

If you’ve seeded your code before the beginning of the training, this might work.
If not, your new process would use a new random seed and I don’t think your approach would work.

hadaev8 · February 15, 2022, 1:32pm

So, I tested for a bit

import torch
from torch.utils.data import DataLoader, Dataset

class DSet(Dataset):
    def __init__(self, num=10):
        self.data = range(10)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return len(self.data)

trainset = DSet()

sampler = torch.utils.data.RandomSampler(trainset, replacement=True, num_samples=20)

train_loader = DataLoader(trainset, num_workers=0, sampler=sampler,
                            batch_size=2, pin_memory=False)

torch.manual_seed(0)
for i in train_loader:
    print(i)

Manual seed before calling dataloader make it reproducible, but how can I resume training from the exact step?
Using continue inside loop would make it read all data from disc.