Torch.manual_seed(seed) behaviour in torch

vr308 · March 30, 2020, 1:37pm

Trying to generate random samples in increasing sizes within a for-loop. If you run the code below you might see expected trails for sizes 20,30 and 40 but different for size 10.

n_sizes = [10,20,30,40]

    for i in np.arange(len(n_sizes)):
        
           n_train = n_sizes[i]
           torch.manual_seed(4321)
           print(torch.randn(n_train))

tensor([ 0.7602,  0.0206, -0.5338, -0.9620, -1.7630,  0.4865,  2.1059, -0.5918,
        -1.2425, -0.7120])
tensor([-0.4716, -0.3436, -1.1742,  0.1221, -0.0493,  0.0144, -0.3218, -0.1144,
         2.0888,  0.8194, -0.8230, -0.4696, -1.0848,  0.4038,  1.6026,  2.4233,
         0.0374,  1.4740, -1.8088, -0.8935])
tensor([-0.4716, -0.3436, -1.1742,  0.1221,  1.3231, -0.6415,  0.8538, -1.8969,
         0.2142,  1.1937, -0.8704,  0.2439, -0.0453,  1.4172, -0.8123, -0.1934,
        -0.0318,  0.6023,  1.4111,  1.3920,  0.1090, -1.5328,  0.0391,  0.7470,
         0.0686,  0.0899,  0.5482, -0.5459, -0.5019,  1.1048])
tensor([-0.4716, -0.3436, -1.1742,  0.1221,  1.3231, -0.6415,  0.8538, -1.8969,
         0.2142,  1.1937, -0.8704,  0.2439, -0.0453,  1.4172, -0.0614,  1.5471,
         1.4126,  0.0268,  0.5757, -0.8794, -0.0493,  0.0144, -0.3218, -0.1144,
        -0.6089, -0.1303,  0.1426,  1.6467,  0.8824, -0.8752, -0.4935,  0.4820,
        -0.6308, -0.1754,  0.3182,  1.7125, -1.5122,  0.5076,  0.1487,  0.4369])

albanD · March 30, 2020, 1:47pm

Hi,

This is most likely because the logic to sample a large number of numbers is not just: run the same logic to sample a small amount of numbers multiple times.
From a quick look at the code, it uses different kernels to be able to make sure of vectorized instructions in particular.

vr308 · March 30, 2020, 7:15pm

Behaviour in numpy is as expected:

  n_sizes = [10,20,30,40]

    for i in np.arange(len(n_sizes)):
        
           n_train = n_sizes[i]
           np.random.seed(4321)
           print(np.random.normal(size=n_train))

[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386 ]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386  -0.77911324 -0.12907887  0.55540811  0.41840177
  0.3293098   1.140024   -0.00801868  0.76995465 -1.35304484 -0.65235036]
[-0.76652152  0.96119469  1.45634699 -0.52979269 -0.26476741 -0.57217131
 -0.75111347  2.22821657  0.65734057 -1.13237756  0.3301812   0.1310132
  0.80145179 -1.46271304 -1.41138966 -1.6531237  -0.14342971  0.51649005
  1.28008742 -1.2251386  -0.77911324 -0.12907887  0.55540811  0.41840177
  0.3293098   1.140024   -0.00801868  0.76995465 -1.35304484 -0.65235036
 -1.65033614  0.98659412  1.74636899 -0.29749675  0.36506456 -1.4723392
  0.53323632 -1.62450975 -1.35767072 -0.76749929]

I dont see why the trail for size 10 should be different to the trail for size 20 (with 10 new ones tacked on to the bottom) when the seed is being reset everytime.

albanD · March 30, 2020, 7:34pm

As you can see, the difference happen when you reach 16:

In []: for n_train in range(1, 20):
   ...:     torch.manual_seed(4321)
   ...:     print(n_train, torch.randn(n_train)[0])
   ...:
1 tensor(0.7602)
2 tensor(0.7602)
3 tensor(0.7602)
4 tensor(0.7602)
5 tensor(0.7602)
6 tensor(0.7602)
7 tensor(0.7602)
8 tensor(0.7602)
9 tensor(0.7602)
10 tensor(0.7602)
11 tensor(0.7602)
12 tensor(0.7602)
13 tensor(0.7602)
14 tensor(0.7602)
15 tensor(0.7602)
16 tensor(-0.4716)
17 tensor(-0.4716)
18 tensor(-0.4716)
19 tensor(-0.4716)

This happens because we only use vectorized operations for Tensors larger than 16: here.