Torch.manual_seed(seed) behaviour in torch

Trying to generate random samples in increasing sizes within a for-loop. If you run the code below you might see expected trails for sizes 20,30 and 40 but different for size 10.

n_sizes = [10,20,30,40]

    for i in np.arange(len(n_sizes)):
        
           n_train = n_sizes[i]
           torch.manual_seed(4321)
           print(torch.randn(n_train))
tensor([ 0.7602,  0.0206, -0.5338, -0.9620, -1.7630,  0.4865,  2.1059, -0.5918,
        -1.2425, -0.7120])
tensor([-0.4716, -0.3436, -1.1742,  0.1221, -0.0493,  0.0144, -0.3218, -0.1144,
         2.0888,  0.8194, -0.8230, -0.4696, -1.0848,  0.4038,  1.6026,  2.4233,
         0.0374,  1.4740, -1.8088, -0.8935])
tensor([-0.4716, -0.3436, -1.1742,  0.1221,  1.3231, -0.6415,  0.8538, -1.8969,
         0.2142,  1.1937, -0.8704,  0.2439, -0.0453,  1.4172, -0.8123, -0.1934,
        -0.0318,  0.6023,  1.4111,  1.3920,  0.1090, -1.5328,  0.0391,  0.7470,
         0.0686,  0.0899,  0.5482, -0.5459, -0.5019,  1.1048])
tensor([-0.4716, -0.3436, -1.1742,  0.1221,  1.3231, -0.6415,  0.8538, -1.8969,
         0.2142,  1.1937, -0.8704,  0.2439, -0.0453,  1.4172, -0.0614,  1.5471,
         1.4126,  0.0268,  0.5757, -0.8794, -0.0493,  0.0144, -0.3218, -0.1144,
        -0.6089, -0.1303,  0.1426,  1.6467,  0.8824, -0.8752, -0.4935,  0.4820,
        -0.6308, -0.1754,  0.3182,  1.7125, -1.5122,  0.5076,  0.1487,  0.4369])

Hi,

This is most likely because the logic to sample a large number of numbers is not just: run the same logic to sample a small amount of numbers multiple times.
From a quick look at the code, it uses different kernels to be able to make sure of vectorized instructions in particular.

Behaviour in numpy is as expected:

  n_sizes = [10,20,30,40]

    for i in np.arange(len(n_sizes)):
        
           n_train = n_sizes[i]
           np.random.seed(4321)
           print(np.random.normal(size=n_train))
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386 ]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386  -0.77911324 -0.12907887  0.55540811  0.41840177
  0.3293098   1.140024   -0.00801868  0.76995465 -1.35304484 -0.65235036]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386  -0.77911324 -0.12907887  0.55540811  0.41840177
  0.3293098   1.140024   -0.00801868  0.76995465 -1.35304484 -0.65235036
 -1.65033614  0.98659412  1.74636899 -0.29749675  0.36506456 -1.4723392
  0.53323632 -1.62450975 -1.35767072 -0.76749929]

I dont see why the trail for size 10 should be different to the trail for size 20 (with 10 new ones tacked on to the bottom) when the seed is being reset everytime.

As you can see, the difference happen when you reach 16:

In []: for n_train in range(1, 20):
   ...:     torch.manual_seed(4321)
   ...:     print(n_train, torch.randn(n_train)[0])
   ...:
1 tensor(0.7602)
2 tensor(0.7602)
3 tensor(0.7602)
4 tensor(0.7602)
5 tensor(0.7602)
6 tensor(0.7602)
7 tensor(0.7602)
8 tensor(0.7602)
9 tensor(0.7602)
10 tensor(0.7602)
11 tensor(0.7602)
12 tensor(0.7602)
13 tensor(0.7602)
14 tensor(0.7602)
15 tensor(0.7602)
16 tensor(-0.4716)
17 tensor(-0.4716)
18 tensor(-0.4716)
19 tensor(-0.4716)

This happens because we only use vectorized operations for Tensors larger than 16: here.

1 Like