# Initialize the weights in layers -- Which method is most recommended?

Dear experienced friends,

These days I roamed around our PyTorch Forums and tried to find a way to initialize the weight matrix. And I found several ways to achieve that. May I ask which one would you recommend most?

Suppose we have a very simple (but typical) neural network. And our target is to initialize the weight in the first `conv1` layer as `[[0.,0.,0.],[1.,1.,1.],[2.,2.,2.]]`. (As a `3*3` filter)

``````class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 3)
self.pool = nn.MaxPool2d(2, 2)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))

return x
``````

So here are serval ways that we can initialize the weights: (Huge respect to vmirly1, ptrblck, et al.)

• Method 1 Define the customize weight matrix inside the `__init__`:
``````class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 3, 3)
self.pool = nn.MaxPool2d(2, 2)

K = torch.tensor([[0.,0.,0.],[1.,1.,1.],[2.,2.,2.]])  # add the weight here
K = torch.unsqueeze(torch.unsqueeze(K,0),0) # assign it to the cov1
self.conv1.weight.data = self.conv1.weight.data * 0 + K

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))

return x
``````

• Method 2: Define the weight after you build the instance (before your training, of course)
``````net = Net()

# then change the weights outside the class
K = torch.tensor([[0.,0.,0.],[1.,1.,1.],[2.,2.,2.]])
K = torch.unsqueeze(torch.unsqueeze(K,0),0)
net.conv1.weight.data = net.conv1[0].weight.data *0 + K
``````

• Method 3 Use the `saved-state-dict` to update the weights

• Method 4 Use a class method to achieve that (from tutorials)
``````class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 3, 3)
self.pool = nn.MaxPool2d(2, 2)
self.init_weights()

def init_weights(self):
K = torch.tensor([[0.,0.,0.],[1.,1.,1.],[2.,2.,2.]])
K = torch.unsqueeze(torch.unsqueeze(K,0),0)
self.conv1.weight.data = K

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
return x
``````

I think these are all the available methods that I can find on the Internet. May I ask which one is most recommended? Or any one of them is risky?

I personally prefer method 4 because it can be really convenient when, for example, you want to initialize the weights of multiple layers. Hereâ€™s one example of when it can be helpful:

``````    def initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, nonlinearity='relu')

if m.bias is not None:
nn.init.constant_(m.bias, 0)

elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)

elif isinstance(m, nn.Linear):
nn.init.kaiming_normal_(m.weight, nonlinearity='relu')

if m.bias is not None:
nn.init.constant_(m.bias, 0)
``````
3 Likes

I also like @Superklezâ€™s approach for the same mentioned reason. In case you just want to assign known values to a single layer, I would probably use â€śMethod 1â€ť.

While your approaches would work fine, I would not recommend to use the `.data` attribute in any of them, as it might yield unwanted side effects. You could assign a new `nn.Parameter` to the `weight` attribute directly (and by wrapping it into a `with torch.no_grad()` block if necessary), use the `nn.init` methods as seen in @Superklezâ€™s code, or the `.copy_` method in case you want to assign the values directly to a parameter.

2 Likes

Thank you for your suggestion, Superklez. Your example greatly explains how to use control-flow to initialize distinct layers. Really helpful!

Hi ptrblck, thank you a lot for your explanation. I just test the method you mentioned, it works great as follows:

``````class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 3, 3)
self.pool = nn.MaxPool2d(2, 2)

# now assign the parameters
K = torch.tensor([[0.,0.,0.],[1.,1.,1.],[2.,2.,2.]])
K = torch.unsqueeze(torch.unsqueeze(K,0),0)

self.conv1.weight = nn.Parameter(K)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
return x

net = Net()
``````

However, I am quite confused with `torch.no_grad()` here. As it is mentioned in the document, `with torch.no_grad()` could set the `require gradient` into False and stop the gradient calculation. Nevertheless, when I print out the parameters, all of them are still trainable. May I ask why this happen?

``````for param in net.parameters():
print(param)

Parameter containing:
tensor([[[[0., 0., 0.],
[1., 1., 1.],
`torch.no_grad()` will make sure that the operations inside the block are not tracked by Autograd and thus not recorded in the computation graph (as you donâ€™t want to backpropagate through the parameter assignment).
The `nn.Parameter` itself should keep its `requires_grad=True` attribute.