The simple code below yields different results on different GPUs (e.g., 1070 got 998874 but 1080Ti got 998836). I wonder if I did something wrong or it is just impossible to get the same result on different GPUs?
import numpy as np
seed = 0
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
np.random.seed(seed) # Numpy module.
random.seed(seed) # Python random module.
a = torch.ones(1000,1000).to('cuda:0')
dropout = torch.nn.Dropout(0.5).cuda()
b = dropout(a)
Are you using the same PyTorch version (CUDA, cudnn)?
Getting the same “random” numbers on different hardware is sometimes quite hard.
However, using your code, I get the same result (
tensor(1000260., device='cuda:0')) for:
- PyTorch 1.2.0, CUDA10.0.130, cudnn7602, TitanV
- PyTorch master ~few weeks old, CUDA10.1.168, cudnn7601, V100
Yes, the same environment. I don’t have a TitanV to try but I guess it is quite similar to V100 so they could yield the same result.
My local server (PyTorch 1.20, CUDA 10.0.130, CUDNN7602, 2080Ti) got 998908.
Instances on Google Cloud using the official pytorch 1.20 image (exactly same versions as above): got 1000260 on V100 but 999100 on K80.
Yeah, that might be the case. However, I would assume the
1080Ti are also similar to each other (same architecture).
My K80 also got 999100. My two 1080s on quite different machines both got 998662.
I checked again, my 1070 and 1080ti machines have the same CUDA CUDNN PyTorch versions, but their results are different …