How to fix/define the initialization weights/seed

How I can fix the weight initialization or define my own initialization for weights?
I read Weight initilzation , but im still not sure how to do it.
Lets say i implemented this simple network: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

How can i define the weight to be what i like for the start?

Thanks

1 Like

You have to create the init function and apply it to the model:

def weights_init(m):
    if isinstance(m, nn.Conv2d):
        nn.init.xavier_uniform(m.weight.data)
        nn.init.xavier_uniform(m.bias.data)
    

model = MyModel()
model.apply(weights_init)
2 Likes

@ptrblck Thank you!!! Makes sense.

Just a follow up question.

Lets say your network is like this:


class MyModel(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

and you define the weight like this:

def weights_init(m):
    if isinstance(m, nn.Conv2d):
        nn.init.xavier_uniform(m.weight.data)
        nn.init.xavier_uniform(m.bias.data)

Am i putting it in a right place in th following?


class MyModel(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def weights_init(m):
    if isinstance(m, nn.Conv2d):
        nn.init.xavier_uniform(m.weight.data)
        nn.init.xavier_uniform(m.bias.data)

   def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Or you suggest to put it in another way?
I know it is kinda a stupid question, but the reason that im asking it is that I was wondering if if there is a way to put it inside the init(self), but im not sure if it is possible and also not sure if it is even a good idea.
So just was wondering.

Thanks

You can define it as a class function, but usually it’s defined outside of your model so that it can be reused by other models.
Make sure to add another condition and init for the linear layers :wink:

2 Likes

I notice that the model uses ReLU activations and therefore plain Xavier initialisation will be suboptimal for the main weights.

A better solution would be to supply the correct gain parameter for the activation.
https://pytorch.org/docs/stable/nn.html#torch.nn.init.calculate_gain

nn.init.xavier_uniform(m.weight.data, nn.init.calculate_gain('relu'))

With relu activation this almost gives you the Kaiming initialisation scheme. Kaiming uses either fan_in or fan_out, Xavier uses the average of fan_in and fan_out.

For the biases though, always initialising them to zero seems like a solid default choice.

1 Like

After doing this do I also have to do torch.cuda.manual_seed to fix the weights?

Setting the seed before initializing the parameters will make sure to use the same pseudo-random values the next time you are executing the script.

I’m not sure, what you are meaning with “fixate the weights”. Could you explain it a bit?

Hi Ptr,
The network I am using has Dropout, Pooling layers… So when I use weight initialization as follows:

    def weights_initnot(m):
        xavier=torch.nn.init.xavier_uniform_(m.weight.data,init.calculate_gain('relu'))
        classname = m.__class__.__name__
        if classname.find('Conv') != -1:
            xavier(m.weight.data)
    #         print 'come xavier'
            #xavier(m.bias.data)
        elif classname.find('BatchNorm') != -1:
            m.weight.data.normal_(1.0, 0.02)
            m.bias.data.fill_(0)
        elif classname.find('Linear')!=-1:
            m.weight.data.normal_(0.0, 0.02)
            m.bias.data.fill_(0)
        # elif classname.find('Dropout') != -1:

I get errors like there is no weight attribute in Dropout or Pool class. which makes sense.

Then I used the following weight init using isinstance… it kinda solved the problem because I am getting similar weights in specific layer weight index with the same seed.

seed = 1
torch.cuda.manual_seed(seed)


def weights_init(m):
    if isinstance(m, nn.Conv3d):
        torch.nn.init.xavier_uniform_(m.weight.data, init.calculate_gain('relu'))
        # torch.nn.init.xavier_uniform_(m.bias.data)
    elif isinstance(m, nn.BatchNorm3d):
        m.weight.data.normal_(mean=1.0, std=0.02)
        m.bias.data.fill_(0)
    elif isinstance(m, nn.Linear):
        m.weight.data.normal_(0.0, 0.02)
        m.bias.data.fill_(0)

    Net = ResNet3D().to(device)
    Net.apply(weights_init)

On another note:
Do seed_torch and manual_seed work the same way? I have seen some codes use seed_torch.

Seeking Explanation: I was talking to one of my friends about the purpose of fix weight initialization. He said it has no big impact because the data we are feeding in are random because of the DataLoader. And we are saving the best-performing weights anyway so fixed weights are not a biggy.

What do you think about it?

And why you’re using the uniform distribution, please

xavier_uniform was just used as an example for no particular reason.
You can of course use whatever nn.init method fits your use case best. :wink:

1 Like

I want to initialize the weights for every layer (irrespective of the initialization method) using a constant seed value. How exactly it’s done in Pytorch?

If you want to set the same seed before each initialization, you could add torch.manual_seed(SEED) to the weight_init method (before each torch.nn.init call).

I want each linear layer weights/biases to be initialized with the constant values. Following is the weight_init() method the way you suggested:

def weight_init(m):
    if isinstance(m, torch.nn.Linear):
        torch.manual_seed(786)
        torch.nn.init.xavier_uniform_(m.weight.data)
        torch.nn.init.xavier_uniform_(m.bias.data)

Applying above to the linear layers in the classifier parts of the VGG19.

model = VGG19()
model.apply(weight_init)

Following error is generated each time I call model.apply(weight_init):

ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions

Where am I making a mistake?

You cannot apply xavier_uniform_ on the bias, as more than a single dimension is needed in the tensor.

Initializing the bias values differently will ruin my experiment. Is there a workaround other than initializing them with 0’s/1’s?

torch.nn.init.zeros_ or torch.nn.init.ones_ should work.