Shuffle issue in DataLoader? How to get the same data shuffle results with fixed seed but different network?

The shuffle results of DataLoader changes according to different network architecture;

I set fixed random seed in the front as below:

torch.backends.cudnn.deterministic = True
random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
np.random.seed(1)

and I can get same shuffle results every running time;

But if I changed the network (for example, reduce channels of each layer), and keep the same random seed, shuffle results will changed.

How to get the same shuffle results in this case?

My environment as below:
Ubuntu 16.04 LTS, Torch 0.4.1

1 Like

This sounds a bit weird to me. I never had this issue.
Did you try with num_workers=0? And do you use the same batch size etc.?

I tried with num_workers=0 or other non-zero values, seems no help on this issue.

Of course I used the same batch size, same training dataset, same seeds, only network changed.

And I also tried on another PC, the issue also exist.

it’s really weird…

hi @tom, do you have any ideas?

I think that you are initializing the network before the dataloader. For this reason, when you change the network size, the samples generated by the dataloader also change. Because, as you know all filters and bias need to be initialized normally using random methods. A change of the number of times that random method is used implies a change to the state of the random generator and this can be the cause that your dataloader is generating different values.

12 Likes

Great example of never tag. :slight_smile:
As @amosella explains much better than I would have, the random init leaves you with different states depending on the number of parameters (and possibly the init method).
So the easiest solution is to re-seed after the weight init and before the dataloader init. If I remember correctly, the CPU random generator is the important one here.

Best regards

Thomas

1 Like

Hi, @amosella,
You are right, I tried to seed after network init, then I can always get same dataloader results.
Thank you so so so much.

Hi, @tom,
I confirmed that re-seed before dataloader works.
Thank you.

1 Like

I followed your instructions however I cannot make it reproduce the same results.

This is the function that is called after each time I initialize my network (and before dataloaders):

def set_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)

So each time before starting the train loop my code looks like this:

network = network.cuda() if use_cuda else network
set_seed(0)
train_loader = torch.utils.data.DataLoader(train_set, num_workers=0, batch_size=run.batch_size, shuffle=True, worker_init_fn=np.random.seed(0))
set_seed(0)
valid_loader = torch.utils.data.DataLoader(valid_set, num_workers=0, batch_size=run.batch_size, shuffle=True, worker_init_fn=np.random.seed(0))
2 Likes

In my case, I had two versions of the code loading same data (differently stored). I expected the network to be the same but still the data was shuffled. However, the above hack worked if I re-seeded before the dataloader code. Not sure what was causing the issue but it worked.

Have you try to add re-seed to a customized worker_init_fn?