Layers are not initialized with same weights with manual seed

When setting all seeds manually, I would expect that all new layers of a given type have the same initial weights. However, that is not the case.

import torch
from torch import nn
import os
import numpy as np
import random

torch.manual_seed(3)
torch.cuda.manual_seed_all(3)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(3)
random.seed(3)
os.environ['PYTHONHASHSEED'] = str(3)

linear = nn.Linear(5, 2)
linear2 = nn.Linear(5, 2)

print(linear.weight)
print(linear2.weight)

This will print:

Parameter containing:
tensor([[ 0.2083, -0.4267, -0.2550,  0.4328, -0.1910],
        [ 0.1918,  0.1967, -0.3020, -0.1627, -0.1698]], requires_grad=True)
Parameter containing:
tensor([[-0.0255, -0.1283,  0.3900, -0.0621, -0.3761],
        [ 0.1991,  0.3531, -0.1468, -0.1808,  0.4017]], requires_grad=True)

Since the initialization of Linear is equal for all, and random effects have been removed by manually setting the seed, how then do the weights differ? And what is the reasoning for them being different?

4 Likes

The reason is because generating some numbers change the state of the random number generator.
If you set the seed back and the create the layer again, you will get the same weights:

import torch
from torch import nn

torch.manual_seed(3)
linear = nn.Linear(5, 2)

torch.manual_seed(3)
linear2 = nn.Linear(5, 2)

print(linear.weight)
print(linear2.weight)
8 Likes

Yeah, I figured this out yesterday as well. I keep forgetting how seeding and random generators work. But thanks for the confirmation!

1 Like

Is there any way to always fix the seed to at a certain number so that I don’t have to call torch.manual_seed(3) every single time I initialize a nn.Linear?

The way it is right now makes reproducibility really hard.

If you fix the seed to a given number all the time then your random number generator would return the same number all the time. I don’t think that is what you want right?

I do want my random number generator to return the same number all the time in this case. I’m implementing my own transformer and want to make sure that it matches the PyTorch transformer. Is there any way to fix the seed at all times or other ways to accomplish what I want?

Could you explain what you mean by this please? I am not sure what is the behavior you expect to see?
Maybe give a code sample that show what you want to get?

Just like what you said above, if I want to initialize two linear layers with the same weights right now, I’d have to do

import torch
from torch import nn

torch.manual_seed(3)
linear = nn.Linear(5, 2)

torch.manual_seed(3)
linear2 = nn.Linear(5, 2)

This forces me to leave torch.manual_seed(3) everywhere if I have a lot of linears and want to make sure they all initialize to the same weights. Is there a function (let’s call it torch.fix_global_seed) such that if I do

import torch
from torch import nn

torch.fix_global_seed(3)
linear = nn.Linear(5, 2)
linear2 = nn.Linear(5, 2)
linear3 = nn.Linear(5,2)
linear4 = nn.Linear(5,2)

All four linears would be initialized with the same weight?

There is no such function no.
You don’t actually want to fix the global seed, you want to reset it after each layer created right?
If it was globally set, it would only ever generate a single number which is not what you want.

I think the simplest here would be to create a custom Linear Module that inherits from the existing Linear that will set the seed properly before calling the original linear’s initialization.

Oh you mean all the numbers in the Linear’s weight tensor would be 42 if I fix the global seed to 42, right?

The custom Linear Module sounds like a great idea! Thanks!

It might not be that value, but yes, it will give you a Tensor of [123, 123, 123, 123, 123] for example.

Makes sense. Thanks!

[123, 123, 123, 123, 123]

@albanD Why would I get something like that? The seed defines the initialization of random sequence, therefore one seed should generate a sequence of numbers allowing for tensors of following form, for example.

[123, 523, 102, 12, 36]

and across different linear layers they should remain the same, that means

linear.weight == linear2.weight

Hi,

Setting the seed would have the behavior your describe yes.
The question above (if I recall correctly) was about forcing the seed to remain a given value. Which would like like setting it back after each number is generated. Which would lead to always to same number to be generated. But that’s not a thing you can or would want to do for sure :slight_smile:

I see, prolly had a bit of misunderstanding. I understood it as layer-wise constant behavior description, which is definitely achievable by current implementation. :+1:

1 Like