Loss is increasing

I have this simple nonlinerar function that I want to fit data, but loss function keeps increasing after each iteration. Any idea what I am doing wrong?

import torch
# w2=2, w1=3

b = 5

x_data = []
y_data = []

for i in range(20):
    i = float(i+1)
    x_data.append(i)
    # y = w2*x^2 + w1*x
    y_data.append(i * i * 2 + i * 3)

w1 = torch.tensor([5.0], requires_grad=True)
w2 = torch.tensor([10.0], requires_grad=True)

# our model forward pass
def forward(x):
    return x * x * w2 + x * w1

# Loss function
def loss(y_pred, y_val):
    return (y_pred - y_val) ** 2

# Before training
print("Prediction (before training)",  4, forward(4).item())

# Training loop
for epoch in range(100):
    for x_val, y_val in zip(x_data, y_data):
        y_pred = forward(x_val) # 1) Forward pass
        l = loss(y_pred, y_val) # 2) Compute loss
        l.backward() # 3) Back propagation to update weights
        # print("\tgrad: ", x_val, y_val, w.grad.item())
        w1.data = w1.data - 0.01 * w1.grad.item()
        w2.data = w2.data - 0.01 * w2.grad.item()

        # Manually zero the gradients after updating weights
        w1.grad.data.zero_()
        w2.grad.data.zero_()

        print(l.item())

# After training
print("Prediction (after training)",  4, forward(4).item())

result:
625.0
3969.0
4106.24658203125
10951.541015625
494030.0
146011072.0
176458432512.0
690122123116544.0
7.44382236290397e+18
1.967258245916323e+23
1.1615622546788777e+28
1.4223510151775662e+33

I think .data is deprecated, so without it, neural network will work.
so,

w1.data = w1.data - 0.01 * w1.grad.item()
w1.grad.data.zero_()

becomes

w1 = w1 - 0.01 * w1.grad.item()
w1.grad.zero_()

also, I think we should use in-place, otherwise a new w1 would be created, and its gradient is None, so zeroing it out is an invalid operation, for example,

w1 = w1 - 0.01 * w1.grad.item()
w1.grad.zero_()

will give error,

AttributeError: 'NoneType' object has no attribute 'zero_'

as we created new w1, for which grad is None.

so, we use in-place,

w1 -= 0.01 * w1.grad.item()
w1.grad.zero_()

this way, new w1 will not be created.
also, if we modify w1 like this, then it will give error,

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

so, we use

with torch.no_grad():
  w1 -= 0.01 * w1.grad*item()
  w1.grad.zero_()

this will ensure that these computations have required_grad=False, even when for w1 requires_grad is set as True

also, if you use square in your loss, then it gives big values, as listed in your post, so if we use only difference, and use absolute value of it, then it will start working, for example,

# Loss function
def loss(y_pred, y_val):
    return (y_pred - y_val).abs()

when I make these changes, and train for 1000 epochs, then I get

Prediction (after training) 4 43.99794006347656

thank you for answering, but I don’t see it converging:

here is the updated code:

# Loss function
def loss(y_pred, y_val):
    return (y_pred - y_val).abs()

# Training loop
for epoch in range(1000):
    for x_val, y_val in zip(x_data, y_data):
        y_pred = forward(x_val) # 1) Forward pass
        l = loss(y_pred, y_val) # 2) Compute loss
        l.backward() # 3) Back propagation to update weights
        # print("\tgrad: ", x_val, y_val, w.grad.item())
        with torch.no_grad():
            w1 -= 0.01 * w1.grad.item()
            w1.grad.zero_()

            w2 -= 0.01 * w2.grad.item()
            w2.grad.zero_()

        print(l.item())

my last loss values too high and not converging:

27.027191162109375
319.6322021484375
58.488525390625
529.6104125976562
127.1507568359375
814.5077514648438
248.1339111328125
1190.4041748046875
Prediction (after training) 4 113.91207122802734

What am I doing wrong?

I also changed learning rate to 0.0001.

got it, it is converging now, also I had to use 50K epoch :slight_smile:

thank you!