Multiplying pytorch's returned weights don't give the correct output 'y'

Hey, i’m a newcomer to Pytorch. I created a dummy dataset after generating random weight vector and bias, and obtaining ‘y’ using WXt + b. After that I implemented a simple linear regression using Pytorch’s single linear layer, and trained the data with same X and y generated above. The rmse is extremely small and the output is also virtually same as original y, but weight vector is completely different than the original. Bias is same though.
I tried to see if Pytorch somewhat normalizes the weights, so i divided each index of Original W with corresponding index of trained W, and got a constant.
For example:

Original_W = [1.85623121, 0.02777083, 0.32021133]
Trained_W = [11.1365,  0.1659,  1.9205]
Trained_W / Original_W = [5.99952201, 5.97389396, 5.99760163]

(The deviation is probably due to training error)
Can someone please explain?

Thanks!
Saif

Hi Saif!

I’ve implemented a version of what I believe you are doing and
Linear.weight converges to the weight vector used to generate
the training data.

Here is the script:

import torch
print (torch.__version__)

_ = torch.manual_seed (2021)

a = torch.randn (3)
b = torch.randn (1)

model = torch.nn.Linear (3, 1)

nOpt = 100
nPrint = 10
opt = torch.optim.SGD (model.parameters(), lr = 0.1)
loss_fn = torch.nn.MSELoss()

for  i in range (nOpt):
    x = torch.randn (3)
    t = a @ x + b
    p = model (x)
    loss = loss_fn (p, t)
    if  i % (nOpt / nPrint) == 0  or  (i + 1) == nOpt:
        print ('loss =', loss)
    opt.zero_grad()
    loss.backward()
    opt.step()

print ('a =', a)
print ('model.weight =', model.weight.data)
print ('b =', b)
print ('model.bias =', model.bias.data)

And here is its output:

1.7.1
loss = tensor(81.5180, grad_fn=<MseLossBackward>)
loss = tensor(0.8514, grad_fn=<MseLossBackward>)
loss = tensor(0.0013, grad_fn=<MseLossBackward>)
loss = tensor(5.5018e-06, grad_fn=<MseLossBackward>)
loss = tensor(1.4914e-05, grad_fn=<MseLossBackward>)
loss = tensor(7.4943e-07, grad_fn=<MseLossBackward>)
loss = tensor(2.5289e-07, grad_fn=<MseLossBackward>)
loss = tensor(1.3833e-07, grad_fn=<MseLossBackward>)
loss = tensor(8.2538e-10, grad_fn=<MseLossBackward>)
loss = tensor(2.2737e-13, grad_fn=<MseLossBackward>)
loss = tensor(1.4211e-12, grad_fn=<MseLossBackward>)
a = tensor([ 2.2871,  0.6413, -0.8615])
model.weight = tensor([[ 2.2871,  0.6413, -0.8615]])
b = tensor([-0.3649])
model.bias = tensor([-0.3649])

Best.

K. Frank

Thanks a lot for your reply and implementation. I copied your code and you’re right, it does work. But when i merge your code with mine (The only difference is that i’m training whole x which is of shape [125 x 3]). In the end, i print out rmse and last 15 output i.e y[110:] (you can notice they’re the same, but still weights are different. This time even bias is also different)

x_path = Path('../datasets/') / 'synthetic_datasets/1/X.npy'
x = np.load(x_path)
x = torch.from_numpy(x)
x = x.type(torch.FloatTensor)

_ = torch.manual_seed (2021)
a = torch.randn (3)
b = torch.randn (1)
model = torch.nn.Linear (3, 1)
t = a @ x.T + b
t = torch.reshape(t, (len(t),1))
print(t.shape)
nOpt = 10000
optimizer = torch.optim.SGD (model.parameters(), lr = 0.01)
loss_fn = torch.nn.MSELoss()

for  i in range (nOpt):
    y_pred = model(x)
    optimizer.zero_grad()
    y_pred = model(x)
    
    loss = loss_fn(y_pred, t)
    loss.backward()
    optimizer.step()
prediction(model, x, t, 'regression')
print ('a =', a)
print ('b =', b)
for param in model_new.parameters():
    print(param)

Output:

torch.Size([125, 3])
1.7.1
torch.Size([125, 1])
tensor([[10.0661],
[ 9.2046],
[ 8.3431],
[ 7.4816],
[ 6.6201],
[10.7075],
[ 9.8460],
[ 8.9845],
[ 8.1230],
[ 7.2614],
[11.3488],
[10.4873],
[ 9.6258],
[ 8.7643],
[ 7.9028]]) [[10.0661125]
[ 9.204607 ]
[ 8.3431015]
[ 7.481596 ]
[ 6.6200905]
[10.7074585]
[ 9.845953 ]
[ 8.9844475]
[ 8.122942 ]
[ 7.2614365]
[11.348804 ]
[10.487299 ]
[ 9.625793 ]
[ 8.764288 ]
[ 7.902782 ]]
Rmse: 5.683192e-06
a = tensor([ 2.2871, 0.6413, -0.8615])
b = tensor([-0.3649])
Parameter containing:
tensor([[ 4.0516, -9.3466, 6.8186]], requires_grad=True)
Parameter containing:
tensor([1.1346], requires_grad=True)

Hi Saif!

Probably the most practical next step for you to debug your issue
would be for you to go through your code, systematically trimming
out everything you can, while still preserving the issue.

Whittle it down to a short, self-contained, runnable script that
reproduces your issue. Not only is this a good approach to debugging;
it lowers the barrier to people on the forum being able to help you
if you have further questions.

We have no way to tell whether this is right or wrong.

You optimize model, but then you call something called prediction()
and print out the parameters() of something called model_new.

Who knows what model_new is or what its parameters() should
be equal to?

Good luck.

K. Frank