Some questions on linear regression

Hi there,
I am taking an online class on machine learning 2 weeks ago. The very first example the lecturer taught is about two variates linear regression and it uses pytorch code to do that. I am trying to modify it for the case including an interaction term with model

Y=4.5X1 + 3X2 + 0.5X1X2 - 88

I started with 3 training data
x_data = tensor([[1.0,2.0], [2.0,3.0], [3.0,4.0]])
y_data = tensor([[-76.5], [-67.0], [-56.5]])

from torch import nn
import torch
from torch import tensor

x_data = tensor([[1.0,2.0], [2.0,3.0], [3.0,4.0]])
y_data = tensor([[-76.5], [-67.0], [-56.5]])

class Model(nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(2, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred


# our model
model = Model()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(8000):
    # 1) Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # 2) Compute and print loss
    loss = criterion(y_pred, y_data)
    print(f'Epoch: {epoch} | Loss: {loss.item()} ')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


# After training
hour_var = tensor([[4.0, 5.0]])
y_pred = model(hour_var)
print("Prediction (after training)",  4, 5, model(hour_var).data[0][0].item())

In 8000 loops, I find that the loss saturates to a certain number 0.16666288673877716 and won’t improve anymore, and the prediction on input (4, 5) is -46.66689682006836 which is a bit off from the exact value. I believe it may due to two reasons: 1) the training data is not enough; 2) the built-in linear model does not include the cross term?

As I see from the help doc on the nn.linear model

y = wx + b

I think it only gives the relation y = w1x1 + w2x2 + … + xnxn + b, but no cross term x1x2, right?

If that is the case, does it mean I have to define my model to fit in the cross term? It is quite confusing … I think machine learning will try to fit the training data to the best model but if we give the model in advance, it is not different from regular fitting. In this example, I assume the data include the cross term but what happens I don’t know the functional form and I expect the data to have best fit for me, what should I do?

Last question, how could I get the fitting coefficient back after training?