hi all,

I am trying to create a most simple 1 linear layer network to fit a linear regression. Just to help myself better understand how Pytorch works. However, I encountered a strange issue with the model training.

in my model’s **init**() method, I have to add a manual initialization step(shown below) to have the model quickly converge to my regression function. (the weight value 2, 3 are random number, I could put any value here and the model will still converge)

```
self.layer1.weight = torch.nn.Parameter(torch.Tensor([2, 3]))
```

Without this line, the model never converge, the training loss just randomly oscillates in the range of hundreds of thousands. With this line, it quickly decreases to near 1.

I have postulated that it is because default initial weight parameters were too small if I do not initialize them to be far away from zero. Then I changed the initial values and found out the convergence always work as long as I have this line, the exact value I set does not matter. Could someone explain what is going on behind the scene here? Thanks.

My entire script:

```
import torch
import numpy as np
class Net(torch.nn.Module):
def __init__(self, input_dim, output_dim):
super(Net, self).__init__()
self.layer1 = torch.nn.Linear(input_dim, output_dim, bias=False)
self.layer1.weight = torch.nn.Parameter(torch.Tensor([2, 3]))
def forward(self, x):
x = self.layer1(x)
x.squeeze()
return x
# generate data using the linear regression setup y = 5 * x1 + 3 * x2
sample_size = 10000
input_dim = 2
output_dim = 1
epoch = 30
bs = 100
data = np.random.randn(sample_size, 3)
data[:, :2] = data[:, :2] * 100
# add a normal noise term
data[:, 2] = 5 * data[:, 0] + 3 * data[:, 1] + np.random.randn(sample_size)
data = torch.Tensor(data)
train_x = data[:, :input_dim]
train_y = data[:, input_dim]
net = Net(input_dim, output_dim)
net.zero_grad()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.RMSprop(net.parameters(), lr=.01)
for i in range(epoch):
batch = 0
while batch * bs < train_x.shape[0]:
batch_x = train_x[batch * bs : (batch + 1) * bs, :]
batch_y = train_y[batch * bs : (batch + 1) * bs]
pred_y = net.forward(batch_x)
loss = criterion(pred_y, batch_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
#print(f"{i} {batch} {loss}")
print(net.layer1.weight)
batch += 1
```