I noticed some inconsistency in the document and feel really confused:
I met the problem of getting Numpy randomness in multiprocessing (by DataLoader, multiple workers), so I decided to use the worker_init_fn
in the __init__
function of DataLoader, in the note, it is said that “each worker will have its PyTorch seed set to base_seed + worker_id, where base_seed is a long generated by main process using its RNG. You may use torch.initial_seed() to access the PyTorch seed for each worker in worker_init_fn, and use it to set other seeds before data loading.”.
But in the documentation for torch.initial_seed()
, it says “Returns the current random seed of the current GPU.” which is not consistent with that in the Note for DataLoader, my current solution is to set:
worker_init_fn=lambda x: np.random.seed((torch.initial_seed() + x) % (2 ** 32))
Though by printing, I can see the np.random output is not identical, I’m still not sure I’m doing it correctly.
And I would like to use the transform library in torchvision, they use the standard random library, should I also set the seed in through worker_init_fn
?