Random seed is reset inside DataLoader? [maybe bug]

Alex_I · July 13, 2020, 5:51am

I’m using an IterableDataset inside a DataLoader (multiple workers). Some of the stuff in my IterableDataset code calls numpy.random functions. I noticed after a while that in each epoch, the sequence of values returned by the random functions is exactly the same! In other words, every worker is (somehow) reset to the same random seed at the beginning of the epoch (or when it is created). So if (for example) the worker tried to do random image crops with positions from numpy.random, they are the same crops for each image for every epoch.

How/where is the seed set? Is this expected behavior?

WHY does this happen? I would expect the numpy.random seed in each worker to act the same as in a new process, unless numpy.random.seed is explicitly called by the user code.

(I was not doing anything explicit to set the seed, using either numpy or torch calls, or anything to make torch deterministic. This seems to just be the default behavior - torch modifyint numpy to make it deterministic without being requested to by the user)

Simple code to reproduce:

import torch
import numpy as np
from torch.utils.data import DataLoader, Dataset

class TestIterableDataset(torch.utils.data.IterableDataset):
    def __init__(self):
        super(TestIterableDataset).__init__()

    def __iter__(self):
        worker_info = torch.utils.data.get_worker_info()
        for n in range(10):
            yield(worker_info.id, np.random.randint(1000000))
            
ds = TestIterableDataset()

for worker_id, number in DataLoader(ds, batch_size=4, num_workers=2):
    print(worker_id, number)

# This prints the same result every time it is run, and the same sequence from each worker:
# tensor([0, 0, 0, 0]) tensor([ 68669, 230721, 801136, 274196])
# tensor([1, 1, 1, 1]) tensor([ 68669, 230721, 801136, 274196])
# tensor([0, 0, 0, 0]) tensor([617084, 429589, 436968, 718987])
# tensor([1, 1, 1, 1]) tensor([617084, 429589, 436968, 718987])
# tensor([0, 0]) tensor([150977,  59469])
# tensor([1, 1]) tensor([150977,  59469])

Alex_I · July 14, 2020, 6:58am

Opened issue here: https://github.com/pytorch/pytorch/issues/41329
and apparently it’s not a bug, just a gotcha - a pretty well documented one at that.

Numpy random number generator keeps its seed over a fork(), and DataLoader starts worked processed using a fork, without doing anything special about numpy.random. I’d say it seems like much more of a numpy issue - keep seeds over a fork by default is a kind of wtf behavior, but okay. Easy enough to work around, just re-seed in worker_init_fn or at the beginning of iter works fine.

dmitryako · December 8, 2020, 7:21am

That does not solve the randomness problem inside numpy+pytorch. For example, you will get exactly the same numpy random numbers in each epoch! Does anyone know a “clean” solution to this? I have to call (from inside dataset.getitem) a very messy (possibly wrong) code to make every worker have different numpy random seed at different epochs.
def set_cuda_rand_seed():
worker = torch.utils.data.get_worker_info()
new_seed = np.random.randint(0, 2 ** 32 - 1)
if worker is not None:
new_seed = worker.seed
new_seed = int(new_seed) % (2 ** 32 - 1)
new_seed = int(new_seed)
random.seed(new_seed)
np.random.seed(new_seed) # todo <— bug using num_workers>0
return new_seed