 # Why the model fail to converge without a manul weight initialization?

hi all,

I am trying to create a most simple 1 linear layer network to fit a linear regression. Just to help myself better understand how Pytorch works. However, I encountered a strange issue with the model training.

in my model’s init() method, I have to add a manual initialization step(shown below) to have the model quickly converge to my regression function. (the weight value 2, 3 are random number, I could put any value here and the model will still converge)

``````self.layer1.weight = torch.nn.Parameter(torch.Tensor([2, 3]))
``````

Without this line, the model never converge, the training loss just randomly oscillates in the range of hundreds of thousands. With this line, it quickly decreases to near 1.

I have postulated that it is because default initial weight parameters were too small if I do not initialize them to be far away from zero. Then I changed the initial values and found out the convergence always work as long as I have this line, the exact value I set does not matter. Could someone explain what is going on behind the scene here? Thanks.

My entire script:

``````import torch
import numpy as np

class Net(torch.nn.Module):

def __init__(self, input_dim, output_dim):
super(Net, self).__init__()
self.layer1 = torch.nn.Linear(input_dim, output_dim, bias=False)
self.layer1.weight = torch.nn.Parameter(torch.Tensor([2, 3]))

def forward(self, x):
x = self.layer1(x)
x.squeeze()
return x

# generate data using the linear regression setup y = 5 * x1 + 3 * x2

sample_size = 10000
input_dim = 2
output_dim = 1
epoch = 30
bs = 100

data = np.random.randn(sample_size, 3)
data[:, :2] = data[:, :2] * 100
# add a normal noise term
data[:, 2] = 5 * data[:, 0] + 3 * data[:, 1] + np.random.randn(sample_size)
data = torch.Tensor(data)
train_x = data[:, :input_dim]
train_y = data[:, input_dim]

net = Net(input_dim, output_dim)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.RMSprop(net.parameters(), lr=.01)

for i in range(epoch):

batch = 0
while batch * bs < train_x.shape:

batch_x = train_x[batch * bs : (batch + 1) * bs, :]
batch_y = train_y[batch * bs : (batch + 1) * bs]

pred_y = net.forward(batch_x)
loss = criterion(pred_y, batch_y)