DataLoader behaves the same at every epoch with np.random

DataLoader behaves the same at every epoch with np.random

In my code, I need to use np.random.randint() to random crop my data.
Howevery, the random number generated at every epoch is the same.
I think it shoud be different. Does DataLoader set np.random.seed() at the beginning of each epoch? I will post my simplified code below.

Environment

Here is my environment:

➜  ~ uname -a 
Linux tkit-pc 4.16.13-1-ARCH #1 SMP PREEMPT Thu May 31 23:29:29 UTC 2018 x86_64 GNU/Linux
➜  ~ pip list | grep numpy
numpy              1.14.3         
➜  ~ pip list | grep torch
torch              0.4.0a0+200fb22

code

from torch.utils.data import Dataset, DataLoader
import numpy as np

class MyDataset(Dataset):
    def __init__(self):
        pass
    def __getitem__(self, idx):
        result = np.random.randint(0, 1000)
        return result
    def __len__(self):
        return 5

dataset = MyDataset()

loader = DataLoader(dataset, batch_size=1, num_workers=1)
for epoch in range(3):
	print("Epoch:{}".format(epoch))
	for item in loader:
	    print(item)

Result

Epoch:0
tensor([ 106])
tensor([ 450])
tensor([ 685])
tensor([ 648])
tensor([ 959])
Epoch:1
tensor([ 106])
tensor([ 450])
tensor([ 685])
tensor([ 648])
tensor([ 959])
Epoch:2
tensor([ 106])
tensor([ 450])
tensor([ 685])
tensor([ 648])
tensor([ 959])

This is a knwon issue and is explained in the notes of the DataLoader doc.
You should use worker_init_fn to seed other libraries like numpy.

2 Likes

Thanks for your answer. This question confused me a lot.