A simple example but why not work?

import torch
import torch.nn as nn
N,D_in,H,D_out=64,1000,100,10
model=torch.nn.Sequential(
    torch.nn.Linear(D_in,H,bias=True),
    torch.nn.ReLU(),
    torch.nn.Linear(H,D_out,bias=True)
)
#torch.nn.init.normal_(model[0].weight)
#torch.nn.init.normal_(model[2].weight)
loss_fn=torch.nn.MSELoss(reduction='sum')
x=torch.randn(N,D_in)
y=torch.randn(N,D_out)
learning_rate=1e-3
for i in range(1000000):

    y_pred=model(x)
    loss=loss_fn(y,y_pred)
    if(i%2000==0):
        print(i,loss.item())
    loss.backward()
    #with torch.no_grad():
    for param in model.parameters():
        param=param-param.grad*learning_rate

output:
0 654.9241333007812
2000 654.9241333007812
4000 654.9241333007812
6000 654.9241333007812

Loss is not reducing…

Hi shinra!

This is more of a python question than something specific to pytorch.

In:

    for param in model.parameters():

param is python variable which means that it’s a reference to
something. In each iteration through the for loop it is set to refer
to one of the model.parameters().

But when you run the body of the loop:

        param=param-param.grad*learning_rate

you first create a new python tensor,
param-param.grad*learning_rate,
and then set the variable param to refer to that new tensor:

        param=param-param.grad*learning_rate

This changes what param refers to, but doesn’t change the value
of the parameter in model.parameters() to which param used to
refer.

Consider the pure python:

a = [1, 2, 3]
b = a
b = [4, 5, 6]

You can easily verify that after running this b refers to a list that
has the value [4, 5, 6], but that a still refers to the list with value
[1, 2, 3].

This is the same thing that is happening with your variable param
and the actual parameter in model.parameters() that param
temporarily referred to.

(The most convenient way to do what you want is to use
opt = torch.optim.SGD (...), and call opt.step().
Don’t forget to “zero your grads” (e.g., opt.zero_grad())
before calling loss.backward() in each iteration of your
training loop.)

Best.

K. Frank

1 Like

Thank you ! I have got the crux of the problem. If I have some easy ways to upgrade the weights in model.parameters() without using step()?

Hi shinra!

I believe the following should work:

    with torch.no_grad():
        for param in model.parameters():
            param.copy_ (param - param.grad * learning_rate)

param.copy_() changes “in place” the data stored in the parameter
referred to by param, rather than changing the variable param to
refer to something else.

Best.

K. Frank

1 Like

It works well.Thanks so much ! (I should really try to read Docs more)