Change worker seed during runtime

mzimmerman · February 2, 2023, 6:08am

In DataLoader, we could specify the worker_init_fn argument to change the seed accordingly. This however requires the worker to call the init function, effectively only during initialization of the worker. Is there any way I could change the seed of worker during runtime? For example, changes at every n minibatches?

Thanks in advance!

ptrblck · February 2, 2023, 6:29pm

Yes, you can set the seed via torch.manual_seed inside the Dataset.__getitem__ and could also use the worker info as seen here:

class MyDataset(Dataset):
    def __init__(self):
        pass

    def __getitem__(self, idx):
        worker_info = torch.utils.data.get_worker_info()
        if worker_info is not None:
            print(worker_info)
            worker_id = worker_info.id
            torch.manual_seed(worker_id)
        return torch.randn(1)

    def __len__(self):
        return(20)

    
dataset = MyDataset()
dataloader = DataLoader(dataset,
                        batch_size=5,
                        shuffle=False, 
                        num_workers=8)

for data in dataloader:
    print(data)

mzimmerman · February 3, 2023, 6:08am

Huh, never cross my mind hijacking the __getitem__. Thanks @ptrblck, the Twitter famous celebrity