Why is torch.rand different on different devices for big indices

Jackmin801 · May 13, 2025, 3:29am

Hey PyTorch team. Big fan of PyTorch! Great work .

Im having trouble understanding when torch.rand is the same across devices.
I ran the code below on a 3090, A100 and H100:

import torch
import numpy as np
import random

torch.use_deterministic_algorithms(True)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)
random.seed(0)

a = torch.rand(1_000_000, device='cuda', generator=torch.Generator(device='cuda').manual_seed(10))

print(a[32 * 48 * 82 - 2: 32 * 48 * 82 + 2])
print(a[32 * 48 * 144 - 2: 32 * 48 * 144 + 2])

3090 output

tensor([0.5327, 0.4037, 0.5954, 0.2597], device='cuda:0')
tensor([0.6247, 0.3700, 0.1930, 0.9421], device='cuda:0')

A100 output

tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.5954, 0.2597], device='cuda:0')

H100 output

tensor([0.5327, 0.4037, 0.4867, 0.9688], device='cuda:0')
tensor([0.1685, 0.8799, 0.7291, 0.4280], device='cuda:0')

They all obtain the same result up till index 125952. The A100 and H100 start differing at 221184.
Is this behavior intended?

Is the only way for me to obtain the same result across devices for bigger tensor sizes to do smaller rands and cat the result?

ptrblck · May 13, 2025, 10:56am

Yes, this behavior is expected as we do not guarantee to create reproducible results on different setups. If you want to reuse the same pseudorandom values you could serialize them and load in the application which should use them.