I’m training the model using the following simple training loop:

def train(epochs, x, y):
for epoch in range(epochs):
# forward pass
pred = net(x)
# compute the loss
loss = mse(pred, y)
# backward pass
loss.backward()
# optimization step
optimizer.step()
# zero out the gradients to avoid gradient accumulation
optimizer.zero_grad()
print(f"Epoch: {epoch}\t Loss: {loss}")
train(5)

The really weird issue is that the first time I run the process, the model is not learning anything, and the loss remains the same. However, if I run the cell with the model definition and instantiation again, the loss function and the optimizer everything works.

Could you post a short, fully-self-contained, runnable script that reproduces
your issue together with the output you get?

(It seems like you’ve already posted most of what would be in such a script.)

This looks odd here. What are x and y? In range (epochs, y, y), x and y would typically be integers, while in net (x) and mse (pred, y), x and y would typically be pytorch tensors. I don’t see how this code could run.

@KFrank, you’re right. This was a copy-paste error. I edited the example. x and y are two PyTorch tensors and they are now moved to the train function’s signature. I obtain them like this:

x = torch.linspace(-2, 2, steps=20)[:, None]
x = x.float()
y = add_noise(f(x), .3, 1.5)
y = y.float()

This sounds like a flaky training where the convergence fails randomly. Did you try rerunning the code using different seeds to check the convergence rate?

Also, based on your post it seems the output could be negative if the noise is large enough while the last relu usage would clip the output so you might want to remove it.