utils.Dataloaders and repeatability?

jGsch · May 3, 2018, 12:31pm

Hello !

I want to have the same results after restart successively the same code for some repeatability purposes.

To set the seed, I use the following lines:

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

But I do not have the same results at every run. I’ve noticed the problem come from the dataloader. The sequence of data send by the dataloader is different in each run, EXCEPT when I use num_workers=0. So I think, it’s the preloading of CPUs (which have independent seeds) that is causing this problem.

In the doc, you can read:

By default, each worker will have its PyTorch seed set to base_seed + worker_id, where base_seed is a long generated by main process using its RNG. However, seeds for other libraies may be duplicated upon initializing workers (w.g., NumPy), causing each worker to return identical random numbers. (See My data loader workers return identical random numbers section in FAQ.) You may use torch.initial_seed() to access the PyTorch seed for each worker in worker_init_fn, and use it to set other seeds before data loading.

But I don’t really understand what I can do with that or what callable I must use with ‘worker_init_fn’. Anybody got an idea?

(I am using pytorch 0.3.1)