Layers are not initialized with same weights with manual seed

BramVanroy · September 25, 2019, 3:35pm

When setting all seeds manually, I would expect that all new layers of a given type have the same initial weights. However, that is not the case.

import torch
from torch import nn
import os
import numpy as np
import random

torch.manual_seed(3)
torch.cuda.manual_seed_all(3)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(3)
random.seed(3)
os.environ['PYTHONHASHSEED'] = str(3)

linear = nn.Linear(5, 2)
linear2 = nn.Linear(5, 2)

print(linear.weight)
print(linear2.weight)

This will print:

Parameter containing:
tensor([[ 0.2083, -0.4267, -0.2550,  0.4328, -0.1910],
        [ 0.1918,  0.1967, -0.3020, -0.1627, -0.1698]], requires_grad=True)
Parameter containing:
tensor([[-0.0255, -0.1283,  0.3900, -0.0621, -0.3761],
        [ 0.1991,  0.3531, -0.1468, -0.1808,  0.4017]], requires_grad=True)

Since the initialization of Linear is equal for all, and random effects have been removed by manually setting the seed, how then do the weights differ? And what is the reasoning for them being different?

albanD · September 25, 2019, 3:42pm

The reason is because generating some numbers change the state of the random number generator.
If you set the seed back and the create the layer again, you will get the same weights:

import torch
from torch import nn

torch.manual_seed(3)
linear = nn.Linear(5, 2)

torch.manual_seed(3)
linear2 = nn.Linear(5, 2)

print(linear.weight)
print(linear2.weight)

BramVanroy · September 27, 2019, 11:40am

Yeah, I figured this out yesterday as well. I keep forgetting how seeding and random generators work. But thanks for the confirmation!

ghlai9665 · January 7, 2021, 5:01pm

Is there any way to always fix the seed to at a certain number so that I don’t have to call torch.manual_seed(3) every single time I initialize a nn.Linear?

The way it is right now makes reproducibility really hard.

albanD · January 7, 2021, 5:55pm

If you fix the seed to a given number all the time then your random number generator would return the same number all the time. I don’t think that is what you want right?

ghlai9665 · January 8, 2021, 2:55pm

I do want my random number generator to return the same number all the time in this case. I’m implementing my own transformer and want to make sure that it matches the PyTorch transformer. Is there any way to fix the seed at all times or other ways to accomplish what I want?

albanD · January 8, 2021, 4:53pm

Could you explain what you mean by this please? I am not sure what is the behavior you expect to see?
Maybe give a code sample that show what you want to get?

ghlai9665 · January 9, 2021, 3:37pm

Just like what you said above, if I want to initialize two linear layers with the same weights right now, I’d have to do

import torch
from torch import nn

torch.manual_seed(3)
linear = nn.Linear(5, 2)

torch.manual_seed(3)
linear2 = nn.Linear(5, 2)

This forces me to leave torch.manual_seed(3) everywhere if I have a lot of linears and want to make sure they all initialize to the same weights. Is there a function (let’s call it torch.fix_global_seed) such that if I do

import torch
from torch import nn

torch.fix_global_seed(3)
linear = nn.Linear(5, 2)
linear2 = nn.Linear(5, 2)
linear3 = nn.Linear(5,2)
linear4 = nn.Linear(5,2)

All four linears would be initialized with the same weight?

albanD · January 10, 2021, 3:53pm

There is no such function no.
You don’t actually want to fix the global seed, you want to reset it after each layer created right?
If it was globally set, it would only ever generate a single number which is not what you want.

I think the simplest here would be to create a custom Linear Module that inherits from the existing Linear that will set the seed properly before calling the original linear’s initialization.

ghlai9665 · January 11, 2021, 2:52pm

Oh you mean all the numbers in the Linear’s weight tensor would be 42 if I fix the global seed to 42, right?

The custom Linear Module sounds like a great idea! Thanks!

albanD · January 11, 2021, 3:46pm

It might not be that value, but yes, it will give you a Tensor of [123, 123, 123, 123, 123] for example.

ghlai9665 · January 12, 2021, 1:30pm

Makes sense. Thanks!

khatrishubham · September 13, 2021, 1:03pm

[123, 123, 123, 123, 123]

@albanD Why would I get something like that? The seed defines the initialization of random sequence, therefore one seed should generate a sequence of numbers allowing for tensors of following form, for example.

[123, 523, 102, 12, 36]

and across different linear layers they should remain the same, that means

linear.weight == linear2.weight

albanD · September 13, 2021, 1:35pm

Hi,

Setting the seed would have the behavior your describe yes.
The question above (if I recall correctly) was about forcing the seed to remain a given value. Which would like like setting it back after each number is generated. Which would lead to always to same number to be generated. But that’s not a thing you can or would want to do for sure

khatrishubham · September 14, 2021, 11:42pm

I see, prolly had a bit of misunderstanding. I understood it as layer-wise constant behavior description, which is definitely achievable by current implementation.