Document inconsistency in DataLoader and torch.initial_seed

laoreja · July 6, 2018, 9:53pm

I noticed some inconsistency in the document and feel really confused:

I met the problem of getting Numpy randomness in multiprocessing (by DataLoader, multiple workers), so I decided to use the worker_init_fn in the __init__ function of DataLoader, in the note, it is said that “each worker will have its PyTorch seed set to base_seed + worker_id, where base_seed is a long generated by main process using its RNG. You may use torch.initial_seed() to access the PyTorch seed for each worker in worker_init_fn, and use it to set other seeds before data loading.”.

But in the documentation for torch.initial_seed(), it says “Returns the current random seed of the current GPU.” which is not consistent with that in the Note for DataLoader, my current solution is to set:
worker_init_fn=lambda x: np.random.seed((torch.initial_seed() + x) % (2 ** 32))

Though by printing, I can see the np.random output is not identical, I’m still not sure I’m doing it correctly.
And I would like to use the transform library in torchvision, they use the standard random library, should I also set the seed in through worker_init_fn?

SimonW · July 7, 2018, 6:51am

you looked at the wrong function. torch.initial_seed doesn’t say that: https://pytorch.org/docs/stable/torch.html#torch.initial_seed
setting that should just work for versions >= 0.4. in fact, just using torch.initial_seed as seed is enough as it already does the + worker_id offset.

laoreja · July 7, 2018, 9:06am

I see. Thank you very much! I’m using Dash, and happen to look at torch.cuda.initial_seed.