Same layer same initialize method same seed differet weights

with the same layer, same initialize method, and the same seed I run two pieces of code independently but get a different weight, really confusing.

import random
import os
import numpy as np
import torch

seed = 2020
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = True
    
    
def get_encoder():
    input_dim = 109
    encoder = nn.Sequential()
    input_dims = [input_dim] + [
        int(i) for i in np.exp(np.log(input_dim) * np.arange(num_layers - 1, 0, -1) / num_layers)
    ]
    for layer_i, (input_dim, output_dim) in enumerate(zip(input_dims[:-1], input_dims[1:])):
        encoder.add_module("fc_" + str(layer_i), nn.Linear(input_dim, output_dim))
        encoder.add_module("fc_" + str(layer_i) + "_act", nn.Softsign())

    model.add_module("output_layer", nn.Linear(n_hiddens, 1))
    model.add_module("output_layer", nn.Linear(1, 1))
    return encoder

model = get_encoder()
nn.init.kaiming_normal_(model.fc_0.weight)

for p in model.parameters():
    print(p.sum())
    break

# output tensor(-12.7479, grad_fn=<SumBackward0>)
import random
import os
import numpy as np
import torch

seed = 2020
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = True
    
    
model = nn.Linear(109, 33)
nn.init.kaiming_normal_(model.weight)
for n, p in model.named_parameters():
    print(p.sum())
    break

# output tensor(-5.6983, grad_fn=<SumBackward0>)

The determinism also depends on how many times did you call the random number generator. In you examples, there are two models, which are functionally the same. However, their initialization is different, because the number of random values generated after the seeding is different.

Here is what you are doing for the first model:

  1. Set the seed
  2. Create linear layers with random weights 4 times (I am assuming num_layers = 4)
  3. Initialize the weights of the first layer

The number of times you call the random number generator before the kaiming_normal is 4xNumber of elements in linear layers in the encoder.

Now here is what you do in the second model:

  1. Set the seed
  2. Create a single linear layer with random weights
  3. Initialize the weights of that layer

The number of times you call the random number generator before the kaiming_normal is 1xNumber of elements in linear layers in the encoder. That would create different random numbers. If you want your initialization to be completely identical, you will have to either 1) call the linear constructor in the second case the same number of times as in the first case (including the arguments) or 2) Set the seed right before the kaiming_normal_.

P.S. If you want the question to get answered faster, you might need to change the tag. This question has nothing to do with quantization, so its visibility is limited to a wrong group :slight_smile: