However, the final results are still different for each trial (regardless I use CPU or GPU). Are there any other things I should do to get the deterministic results given the same input?
Are you using multiple GPUs? If yes, you need to use torch.cuda.manual_seed_all.
This should be enough and is enough for getting deterministic results in all examples we have. The problem probably lies somewhere in your code.
@supakjk I think that for the moment one cannot chose which algorithm will be used by cudnn in pytorch, so you can’t assume that it will pick the deterministic algo for SpatialConvolution. Also, SpatialMaxPooling is not deterministic in cudnn.
@fmassa Can’t torch.backends.cudnn.enabled = False guarantee that CUDNN won’t be used, which means that the conv related operations would be deterministic even if the CUDA versions are used?
If I don’t have any random initialization or anything random in my neural networks, is it going to affected by the torch.cuda.manual_seed()?
I am trying to understand the role of seeding in pytorch. For example, if I have a model trained with a specific seed, can I say, it will produce the same output for a specific input? While with no seed, its not guaranteed to produce the same output?
One thing is also bothering me that if I train without setting any seed, why I would get different output for the same input given that there is no randomness associated with my model?
I found that multi-thread pre-fetching training samples also introduces randomness. In the multi-thread way, in a new run the samples are put into the queue in a new order, determined by the relative speed of the threads. I had to set the number of pre-fetching threads to 1 to solve the problem.
What’s more, if the pre-fetching thread (only having one pre-fetching thread in this case) is not the main thread (i.e. it’s parallel to the main thread) and both threads are using random numbers, then make sure that these two threads use different random value generators, each generator having its own seed. Otherwise, their relative order of accessing the random value generator may differ in different runs. To have separate random value generators, for example, the main thread may set the seed like numpy.random.seed(seed) and use numpy.random.uniform() to generate a random value; The pre-fetching thread creates its own generator with seed prng = numpy.random.RandomState(seed) and generates values like this prng.uniform().
BTW, I implemented the multi-threading in my own way using package threading, not using the official one.