Hey PyTorch team. Big fan of PyTorch! Great work
.
Im having trouble understanding when torch.rand is the same across devices.
I ran the code below on a 3090, A100 and H100:
import torch
import numpy as np
import random
torch.use_deterministic_algorithms(True)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)
random.seed(0)
a = torch.rand(1_000_000, device='cuda', generator=torch.Generator(device='cuda').manual_seed(10))
print(a[32 * 48 * 82 - 2: 32 * 48 * 82 + 2])
print(a[32 * 48 * 144 - 2: 32 * 48 * 144 + 2])
3090 output
tensor([0.5327, 0.4037, 0.5954, 0.2597], device='cuda:0')
tensor([0.6247, 0.3700, 0.1930, 0.9421], device='cuda:0')
A100 output
tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.5954, 0.2597], device='cuda:0')
H100 output
tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.7291, 0.4280], device='cuda:0')
They all obtain the same result up till index 125952. The A100 and H100 start differing at 221184.
Is this behavior intended?
Is the only way for me to obtain the same result across devices for bigger tensor sizes to do smaller rands and cat the result?