Why has my linear regression always been NaN?

I use chatgpt to learn linear regression, but I don’t understand why it can’t predict?
Where is the mistake?
Epoch 400/1000, Loss: nan
Epoch 500/1000, Loss: nan
Epoch 600/1000, Loss: nan
Epoch 700/1000, Loss: nan
Epoch 800/1000, Loss: nan
Epoch 900/1000, Loss: nan
Epoch 1000/1000, Loss: nan
预测的花费金额: nan

import torch
import torch.nn as nn
import torch.optim as optim

# 1. 数据准备:构造老人年龄(特征)和花费金额(目标)的数据
# 注意:数据形状必须是二维张量,每一行代表一个样本
ages = torch.tensor([[65], [70], [75], [80], [85], [90], [95], [100]], dtype=torch.float32)
spendings = torch.tensor([[200], [250], [300], [350], [400], [450], [500], [550]], dtype=torch.float32)

# 2. 模型构建:用 nn.Sequential 构建一个简单的线性回归模型
# 这里只有一层线性层,将输入的1个特征映射为1个输出
model = nn.Sequential(
    nn.Linear(1, 1)
)

# 3. 定义损失函数和优化器
# 使用均方误差损失函数(MSELoss),优化器选用随机梯度下降(SGD)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

# 4. 模型训练
num_epochs = 1000  # 设置训练轮数
for epoch in range(num_epochs):
    optimizer.zero_grad()       # 清除上一步的梯度
    predictions = model(ages)   # 前向传播:用当前模型预测花费金额
    loss = criterion(predictions, spendings)  # 计算预测值和真实值之间的均方误差
    loss.backward()             # 反向传播:计算梯度
    optimizer.step()            # 更新模型参数

    # 每100个epoch输出一次当前的损失值
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

# 5. 使用训练好的模型进行预测
model.eval()  # 设置为评估模式,关闭 dropout 等训练模式专用功能
with torch.no_grad():  # 关闭梯度计算,提高预测效率
    new_age = torch.tensor([[77.0]], dtype=torch.float32)  # 新输入数据:77岁的老人
    predicted_spending = model(new_age)  # 得到预测值
    print("预测的花费金额:", predicted_spending.item())

Hi bbhxwl!

The problem is that your training is unstable – the gradient is big, each step is big
enough to make things worse, and the next gradient is even bigger. (Becoming
unstable in this way is commonplace with gradient-based optimization algorithms
such as SGD.)

The root cause of this problem is that your data is of order 100, while, by default,
your Linear is (randomly) initialized to be appropriate for data of order one. The
preferred approach would be to normalize your data to be of order one and train with
the normalized data.

However, you can also lower your learning rate, which makes your steps smaller,
eliminating the instability. This does slow your training down, but this can often be
addressed by using momentum with SGD. For your particular use case, a value for
the momentum quite close to one can be appropriate.

Here is a tweaked version of your code that shows stable training without and with
momentum:

import torch
print (torch.__version__)

import torch.nn as nn
import torch.optim as optim

torch.manual_seed (2025)

useMomentum = False

num_epochs = 1000000
momentum = 0.0
if  useMomentum:
    num_epochs = 10000
    momentum = 0.99

print ('useMomentum:', useMomentum)

ages = torch.tensor([[65], [70], [75], [80], [85], [90], [95], [100]], dtype=torch.float32)
spendings = torch.tensor([[200], [250], [300], [350], [400], [450], [500], [550]], dtype=torch.float32)

model = nn.Sequential(
    nn.Linear(1, 1)
)

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr = 1.e-4, momentum = momentum)

for epoch in range(num_epochs):
    optimizer.zero_grad()
    predictions = model(ages)
    loss = criterion(predictions, spendings)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % (num_epochs / 10) == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")
    
model.eval()
with torch.no_grad():
    new_age = torch.tensor([[77.0]], dtype=torch.float32)
    predicted_spending = model(new_age)
    print("predicted_spending.item():", predicted_spending.item())

Here is its output with plain-vanilla SGD (no momentum):

2.6.0+cu126
useMomentum: False
Epoch 100000/1000000, Loss: 1805.0293
Epoch 200000/1000000, Loss: 846.9705
Epoch 300000/1000000, Loss: 397.5519
Epoch 400000/1000000, Loss: 186.5436
Epoch 500000/1000000, Loss: 87.4728
Epoch 600000/1000000, Loss: 41.1881
Epoch 700000/1000000, Loss: 19.4633
Epoch 800000/1000000, Loss: 9.1225
Epoch 900000/1000000, Loss: 4.4042
Epoch 1000000/1000000, Loss: 2.1364
predicted_spending.item(): 320.89605712890625

And here is its output with momentum = 0.99 (and many fewer training epochs):

2.6.0+cu126
useMomentum: True
Epoch 1000/10000, Loss: 1910.8076
Epoch 2000/10000, Loss: 865.7906
Epoch 3000/10000, Loss: 393.9175
Epoch 4000/10000, Loss: 179.2241
Epoch 5000/10000, Loss: 81.5438
Epoch 6000/10000, Loss: 37.1007
Epoch 7000/10000, Loss: 16.8801
Epoch 8000/10000, Loss: 7.6801
Epoch 9000/10000, Loss: 3.4943
Epoch 10000/10000, Loss: 1.5898
predicted_spending.item(): 320.7725524902344

Best.

K. Frank