I run the follow code before definition of modules. (My model uses Embeding, Dropout, LSTM, and Linear layers.)
torch.backends.cudnn.enabled = False
However, the final results are still different for each trial (regardless I use CPU or GPU). Are there any other things I should do to get the deterministic results given the same input?
Are you using multiple GPUs? If yes, you need to use
This should be enough and is enough for getting deterministic results in all examples we have. The problem probably lies somewhere in your code.
For example, if you’re using
random modules, you need to seed them too.
Thanks! It was because I was using
BTW, is it determinsitic to use cudnn libraries such as
SpatialMaxPooling, which can be non-deterministic in lua torch?
@supakjk I think that for the moment one cannot chose which algorithm will be used by
cudnn in pytorch, so you can’t assume that it will pick the deterministic algo for
SpatialMaxPooling is not deterministic in
torch.backends.cudnn.enabled = False guarantee that CUDNN won’t be used, which means that the conv related operations would be deterministic even if the CUDA versions are used?
@supakjk yes, disabling CUDNN is an option for enforcing determinism
@apaszke why don’t just supress
@edgarriba not sure what you mean
@apaszke what’s the use case of having both
manual_seed seeds only the current GPU,
manual_seed_all seeds all of them. We’re thinking about having
torch.manual_seed seed both CPU and all GPUs.
@apaszke nice! it will be very helpful
If I don’t have any random initialization or anything random in my neural networks, is it going to affected by the
I am trying to understand the role of seeding in pytorch. For example, if I have a model trained with a specific seed, can I say, it will produce the same output for a specific input? While with no seed, its not guaranteed to produce the same output?
One thing is also bothering me that if I train without setting any seed, why I would get different output for the same input given that there is no randomness associated with my model?
I found that multi-thread pre-fetching training samples also introduces randomness. In the multi-thread way, in a new run the samples are put into the queue in a new order, determined by the relative speed of the threads. I had to set the number of pre-fetching threads to
1 to solve the problem.
What’s more, if the pre-fetching thread (only having one pre-fetching thread in this case) is not the main thread (i.e. it’s parallel to the main thread) and both threads are using random numbers, then make sure that these two threads use different random value generators, each generator having its own seed. Otherwise, their relative order of accessing the random value generator may differ in different runs. To have separate random value generators, for example, the main thread may set the seed like
numpy.random.seed(seed) and use
numpy.random.uniform() to generate a random value; The pre-fetching thread creates its own generator with seed
prng = numpy.random.RandomState(seed) and generates values like this
BTW, I implemented the multi-threading in my own way using package
threading, not using the official one.