I am using DataLoader and even with single worker getting different generation of image order. How can I make it generate the same image order every time ? (and does it depend on #workers?)
In addition, I would like to be able to serialize DataLoader’s internal state to file so that if I stop the run at the middle of an epoch and resume it later then I can continue from the same place it stopped.
The DataLoader should provide the same random ordering when seeded and if there are no race conditions.
This example gives the same ordering for num_workers=0 or =1.
torch.manual_seed(2809)
torch.backends.cudnn.deterministic = True # Not necessary in this example
class MyDataset(Dataset):
def __init__(self):
self.data = torch.randn(25, 1)
def __getitem__(self, index):
print('Index: ', index)
return self.data[index]
def __len__(self):
return len(self.data)
dataset = MyDataset()
loader = DataLoader(dataset,
batch_size=5,
shuffle=True,
num_workers=0,
pin_memory=True)
for batch_idx, data in enumerate(loader):
data = data.to('cuda')
If you use a higher number of workers the order or the samples might differ.
At least I observe this effect on my machine. I assume this is due to the race conditions in different processes.
Thanks for the code. Your example indeed reproduce numbers in same orders over different trials.
In my code, however, which is very similar in principle to this example, I get a different behavior even for #workers=1. I don’t have a clue where to start looking for the source of this problem.
I would appreciate any advice.
Please note that I also asked about saving the state of DataLoader so that I can stop and resume run during epoch… If you can then please address that part as well.
Do you call any other random functions from another lib like numpy?
If it’s possible, I would remove unnecessary parts of the code and see if a minimal example produces the deterministic behavior.
Regarding your second question: @albanD posted a suggestion here
Thanks… I traced the problem to (unordered) dictionary creation and iteration which led to different results every run. After switching to OrderedDict things work as expected. I wasn’t aware that there is any randomness involved in standard dictionary (thought the keys are unordered but not in a random sense).
Regarding my second question, I went over the linked code but I still would like to use RandomSampler. If I understand correctly, it is sufficient to record 2 variable: the RNG state when creating the iterator over DataLoader (since this is when the random permutation is taken) and also how many “next” requests it received so far. Is it correct that these 2 variable capture the state of the DataLoader ?
I think you are basically right. The use case would be a bit more complicated if you use more workers.
Also how are you going to stop the DataLoader? Do you want to stop it with CTRL+C?
If so, you have to take care of stopping all workers, since I’ve seen quite often zombie python processes from stopped DataLoaders.
I’m not sure, if this will yield to a clean solution.
Why torch.backends.cudnn.deterministic = True does make speed of my code too slow? I am using TitanXp. Without the flag, the code run fast. I am also using cudnn 7 and cuda 9.1 pytotch 0.4
Yes, cudnn.deterministic=True trades speed for determinism.
If you really need the deterministic behavior, you won’t have another option.
Usually, it’s fine to leave it disabled and also set torch.backends.cudnn.benchmark = True to gain a bit more speed.
Thanks. I am writing a network that train from scratch with own dataset. My aim to reproduce the result after each running. However, i cannot reproduce it ( because the losses are different when i rerun the code). This is my setting
I just wanted to see, if the results are equal in such a case, which doesn’t seem to be the case.
Maybe the difference is based on some other effect like the dict/OrderedDict issue?
Could you try to get deterministic results by setting the seeds, set cudnn to deterministic etc.?
This would exclude any possible effects like unknown order in python dicts like in @MosheM’s case.