# Simple Model Code Review

I am learning PyTorch using the d2l online book as a reference and I’m wanting to train a simple single input/single output model that looks like
`y = ReLu(w_1x+b_1) + ReLu(w_2x+b_2)`, where `x` is the input, and `w_i` and `b_i` for `i=1,2` are parameters.
If someone can review the code below and point out why the parameter vector is not converging to a value that makes sense I would really appreciate it. Originally I thought that the line `l.sum().backward()` was the culprit but I coded it manually by using the individual components (similar to the expression in the `model` function) and got the same result. Also, I have modified the training data to have only 2 linear parts (thinking that it would be easier for this model to approximate it) but the model still failed. Thanks so much.

``````import torch
import random
from matplotlib import pyplot as plt
from IPython import display

def make_data(mean, variance, num_examples):
# Make a piecewise linear data distribution
x = torch.normal(0, 5, (num_examples, 1))
y = torch.zeros(num_examples, 1)
i = 0
for xi in x:
if xi < 0:
y[i] = xi + 2
elif xi >= 0 and xi < 5:
y[i] = 2 * xi - 1
else:
y[i] = -xi + 1
i = i + 1
return x, y

def data_iter(batch_size, features, labels):
# Takes a batch size, a matrix of features, and a vector of labels,
# yielding minibatches of the size batch_size
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices)
for i in range(0, num_examples, batch_size):
batch_indices = torch.tensor(indices[i:min(i +
batch_size, num_examples)])
yield features[batch_indices], labels[batch_indices]

def relu(X):
a = torch.zeros_like(X)

def model(features, w, b):
# One input, one output model
return relu(w * features + b) + relu(w * features + b)

def squared_loss(y_hat, y):
return (y_hat - y.reshape(y_hat.shape))**2 / 2

def sgd(params, lr, batch_size):
for param in params:
param -= lr * param.grad / batch_size

num_examples = 10
features, labels = make_data(0, 1, num_examples)
# Initialize parameters, w and b
w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
batch_size = 5
lr = 0.001  # learning rate
num_epochs = 10
net = model
loss = squared_loss

# Training Loop
for epoch in range(num_epochs):
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y)
l.sum().backward()
sgd([w, b], lr, batch_size)
train_l = loss(net(features, w, b), labels)
print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')
print(f"w: {w} b: {b}")
labelshat = model(features, w, b)
display.set_matplotlib_formats('svg')
plt.rcParams['figure.figsize'] = (3.5, 2.5)
fig, ax = plt.subplots()
ax.scatter(features.detach().numpy(), labels.detach().numpy())
ax.scatter(features.detach().numpy(), labelshat.detach().numpy())
plt.show()
``````

You could increase the number of epochs to better fit the labels, but won’t be able to “perfectly” learn the targets, since they contain negative values while you are using a `relu` at the model output (which is thus clipping the output at `max(x, 0)`): Thank you! Since I was focusing on the implementation I completely missed that the output is being clipped to a positive number. I’ll increase the number of epochs and will also try with different piecewise linear models to get a better intuition of how many parameters to use, I’ll probably end up with a fairly wide network with a couple of layers lol. Thanks again!