Issues with reproducibility

kirk86 · February 22, 2021, 1:28am

Hi folks, I’m trying to understand some aspects of reproducibility and it seems that I might have not understood something properly.

I have a toy multilayer perceptron and every time I train and test it I get different results regarding test accuracy.

I’ve seeded all possible sources of randomness I could think of:

seed = 2021
os.environ['PYTHONHASHSEED'] = '2021'
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.benchmarks = False
torch.backends.cudnn.deterministic = True
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.set_deterministic(True)

But I still observe this discrepancy in accuracy every time a re-train and test the model.

I suspect that has to do with the fact that every time the model is constructed the weights are randomly initialised, hence, the different results in accuracy.

It seems like none of the above seeds fixes the weights of network for reproducibility, am I understanding this wrong or did I miss sth crucial regarding reproducibility. (And yes I’ve read the docs about reproducibility)

ptrblck · February 22, 2021, 2:01am

Seeing the code should use the same “random” values to initialize the model.
If you think the model parameters are still different, you could save the state_dict of two different runs and compare the values.

In case the parameters are indeed equal, you could then add debug print statements and try to narrow down which iteration produces different results.

kirk86 · February 22, 2021, 9:24am

Thanks @ptrblck,
from a simple experiment with numpy and seeds it looks that no matter whether you set the seed or not consequent call to numpy.ranodm.randn will yield different results.

In that case why do we expect whenever we create an instance of the model multiple times (e.g. modelA = MLP(), modelB = MLP()) to have same weights? I would expect that modeA and modelB to have different weights due to random initialisation, no?

ptrblck · February 23, 2021, 12:32am

We don’t and it’s expected that sequential calls to the pseudo random number generator will yield different results. Otherwise you would be sampling constant values, which is wrong.

Seeding the code makes sure that the sequence of random numbers are equal between runs.

np.random.seed(2809)
print(np.random.randn(10))
> [ 0.23747478  0.56083014 -1.1650379   0.12482746  0.04529476 -0.60137682
 -0.18559365  0.57410855 -0.53776209 -0.03972935]

print(np.random.randn(10))
> [-0.85086083  0.25095493  0.30150578  1.15011173  0.36414967  0.3197194
 -1.60004496 -0.74648366  0.65739336  0.15710605]

np.random.seed(2809)
print(np.random.randn(10))
> [ 0.23747478  0.56083014 -1.1650379   0.12482746  0.04529476 -0.60137682
 -0.18559365  0.57410855 -0.53776209 -0.03972935]

print(np.random.randn(10))
> [-0.85086083  0.25095493  0.30150578  1.15011173  0.36414967  0.3197194
 -1.60004496 -0.74648366  0.65739336  0.15710605]

The Wikipedia article explains it in more detail.

kirk86 · February 23, 2021, 6:48pm

Thanks for the example, I’m not sure how you achieved random numbers to be equal between runs. For instance, in the example below they all seem different to me, no?

>>> import random
>>> os.environ['PYTHONHASHSEED'] = '2021'
>>> random.seed(2021)
>>> np.random.seed(2021)
>>> np.random.randn(3,3)
array([[ 1.48860905,  0.67601087, -0.41845137],
       [-0.80652081,  0.55587583, -0.70550429],
       [ 1.13085826,  0.64500184,  0.10641374]])
>>> np.random.randn(3,3)
array([[ 0.42215483,  0.12420684, -0.83795346],
       [ 0.4090157 ,  0.10275122, -1.90772239],
       [ 1.1002243 , -1.40232506, -0.22508127]])
>>> np.random.randn(3,3)
array([[-1.33620597,  0.30372151, -0.72015884],
       [ 2.5449146 ,  1.31729112,  0.0726303 ],
       [-0.25610814,  0.13801041,  1.14723599]])
>>> np.random.randn(3,3)
array([[ 1.37626076, -0.47218397,  0.5240849 ],
       [ 1.48510793,  1.48243225,  0.72813051],
       [-0.38964808,  0.27889376,  0.0519002 ]])
>>> np.random.randn(3,3)
array([[-1.04474368, -0.16150753, -2.79353057],
       [ 0.36164801,  0.24010758,  0.47781228],
       [ 0.20885194,  0.91791163, -1.41177784]])
>>> np.random.randn(3,3)
array([[ 1.22423572, -0.54565772,  0.90674805],
       [-0.98261724, -0.63316189,  0.82552024],
       [-1.28449686, -0.34730878, -1.07776075]])
>>> np.random.randn(3,3)
array([[ 0.97257495,  0.41219116, -0.13208604],
       [-0.67335092,  1.22222171, -0.92641342],
       [ 1.4249938 , -0.12478993,  0.70549192]])
>>> np.random.randn(3,3)
array([[ 0.7195475 ,  0.14146394, -0.65444848],
       [-0.67210417,  0.87036849,  1.00324527],
       [-0.36644983,  1.12805168,  0.7925838 ]])
>>> np.random.randn(3,3)
array([[-1.75084239, -0.80886945, -1.41260181],
       [-0.8030097 ,  0.79828047,  1.75599622],
       [-0.0755042 ,  0.42017682,  0.15655606]])

vaibhavballoli · February 23, 2021, 7:03pm

Terminal 1:

>>> import numpy as np
>>> np.random.seed(2809)
>>> 
>>> print(np.random.randn(10))
[ 0.23747478  0.56083014 -1.1650379   0.12482746  0.04529476 -0.60137682
 -0.18559365  0.57410855 -0.53776209 -0.03972935]
>>>

Terminal 2:

>>> import numpy as np
>>> np.random.seed(2809)
>>> 
>>> print(np.random.randn(10))
[ 0.23747478  0.56083014 -1.1650379   0.12482746  0.04529476 -0.60137682
 -0.18559365  0.57410855 -0.53776209 -0.03972935]
>>>

In @ptrblck 's example, the np.random.seed method is called again which “resets” the generator and the same numbers are generated. The random numbers generated will be different every consecutive time it’s called once the seed is set, but the same sequence of numbers will be generated if you initialize it from the same seed from different instances.

kirk86 · February 23, 2021, 8:06pm

I see, thanks for the clarification I missed the 2nd call to np.random.seed() in @ptrblck example.

I suppose that works fine with scripts running on terminal, but maybe its problematic with interactive notebooks, I guess calling the seed() function from cell to cell will be quite annoying?

ptrblck · February 23, 2021, 9:47pm

Why would you want to do that? Resetting the seed would break the assumption that you are “randomly” initializing the model, data etc.
If you want to reuse the same parameters, I would rather load the state_dict of another model instead of relying in the seed.