Dropout: non-deterministic on different GPUs

matthew_zeng · September 5, 2019, 7:11am

The simple code below yields different results on different GPUs (e.g., 1070 got 998874 but 1080Ti got 998836). I wonder if I did something wrong or it is just impossible to get the same result on different GPUs?

import torch
import numpy as np 
import random 

seed = 0
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
np.random.seed(seed)  # Numpy module.
random.seed(seed)  # Python random module.


a = torch.ones(1000,1000).to('cuda:0')
dropout = torch.nn.Dropout(0.5).cuda()
b = dropout(a)

print(torch.sum(torch.abs(b)))

ptrblck · September 5, 2019, 11:43am

Are you using the same PyTorch version (CUDA, cudnn)?

Getting the same “random” numbers on different hardware is sometimes quite hard.
However, using your code, I get the same result (tensor(1000260., device='cuda:0')) for:

PyTorch 1.2.0, CUDA10.0.130, cudnn7602, TitanV
PyTorch master ~few weeks old, CUDA10.1.168, cudnn7601, V100

matthew_zeng · September 5, 2019, 1:08pm

Yes, the same environment. I don’t have a TitanV to try but I guess it is quite similar to V100 so they could yield the same result.

More tests:

My local server (PyTorch 1.20, CUDA 10.0.130, CUDNN7602, 2080Ti) got 998908.
Instances on Google Cloud using the official pytorch 1.20 image (exactly same versions as above): got 1000260 on V100 but 999100 on K80.

ptrblck · September 5, 2019, 1:11pm

Yeah, that might be the case. However, I would assume the 1070 and 1080Ti are also similar to each other (same architecture).

yezhe · September 5, 2019, 1:19pm

My K80 also got 999100. My two 1080s on quite different machines both got 998662.

matthew_zeng · September 5, 2019, 1:22pm

I checked again, my 1070 and 1080ti machines have the same CUDA CUDNN PyTorch versions, but their results are different …