Hey PyTorch team. Big fan of PyTorch! Great work .
Im having trouble understanding when torch.rand
is the same across devices.
I ran the code below on a 3090, A100 and H100:
import torch
import numpy as np
import random
torch.use_deterministic_algorithms(True)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)
random.seed(0)
a = torch.rand(1_000_000, device='cuda', generator=torch.Generator(device='cuda').manual_seed(10))
print(a[32 * 48 * 82 - 2: 32 * 48 * 82 + 2])
print(a[32 * 48 * 144 - 2: 32 * 48 * 144 + 2])
3090 output
tensor([0.5327, 0.4037, 0.5954, 0.2597], device='cuda:0')
tensor([0.6247, 0.3700, 0.1930, 0.9421], device='cuda:0')
A100 output
tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.5954, 0.2597], device='cuda:0')
H100 output
tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.7291, 0.4280], device='cuda:0')
They all obtain the same result up till index 125952
. The A100 and H100 start differing at 221184
.
Is this behavior intended?
Is the only way for me to obtain the same result across devices for bigger tensor sizes to do smaller rands and cat the result?