Seed only one GPU?

Evan_Weissburg · October 24, 2019, 4:34am

I have 8 tasks running on 8 GPUs, potentially concurrently through a Jupyter notebook. I would like each task to be deterministic, so I call torch.cuda.manual_seed_all(seed) in the beginning of each task. However, the tasks are not deterministic, and I believe this is because GPUs are potentially being re-seeded during runtime by another task being launched. Am I correct in this assumption? If so, how do I fix this behavior? If not, what’s the issue? Thanks so much!

ptrblck · October 24, 2019, 10:15am

Based on the Reproducibility docs, you would have to diable cudnn.benchmark and set cudnn.deterministic=True besides seeding.

Also, note that some CUDA operations are non-deterministic:

There are some PyTorch functions that use CUDA functions that can be a source of non-determinism. One class of such CUDA functions are atomic operations, in particular atomicAdd , where the order of parallel additions to the same value is undetermined and, for floating-point variables, a source of variance in the result. PyTorch functions that use atomicAdd in the forward include torch.Tensor.index_add_() , torch.Tensor.scatter_add_() , torch.bincount() .

A number of operations have backwards that use atomicAdd , in particular torch.nn.functional.embedding_bag() , torch.nn.functional.ctc_loss() and many forms of pooling, padding, and sampling. There currently is no simple way of avoiding non-determinism in these functions.

Evan_Weissburg · October 24, 2019, 4:14pm

I’m particularly concerned about accidentally re-seeding all 8 GPUs while another task is in the process of running, thus introducing non-deterministic behavior due to the seed reset. Is this possible?

ptrblck · October 24, 2019, 7:31pm

The seed should be local to the current application.
Did you see any weird behavior?