NaN loss with linear regression

Hello

I have classification problem. My input is sequence of length 341 and output one of three classes {0,1,2}, I want to train linear regression model using Pytorch, I have the following class but during the training, the loss values start to have numbers then inf then NAN. I do not know how to fix that . Also I tried to initialize the weights for linear model but it is the same thing. Also I tried different epochs 5, 20 ,100 all are same . Any suggestions.

class regression(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.input_dim = input_dim
        # One layer
        self.linear = nn.Linear(input_dim, 1)

    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

criterion = torch.nn.MSELoss()

def fit(model, data_loader, optim, epochs):
    for epoch in range(epochs):

        for i, (X, y) in enumerate(data_loader):
            X = X.float()
            y = y.unsqueeze(1).float()
            X = Variable(X, requires_grad=True)
            y = Variable(y, requires_grad=True)
            # Make a prediction for the input X
            pred = model(X)
            #loss = (y-pred).pow(2).mean()
            loss = criterion(y, pred)
            optim.zero_grad()
            loss.backward()
            optim.step()
            print(loss)
            print(type(loss))
        # Give some feedback after each 5th pass through the data
        if epoch % 5 == 0:
            print("Epoch", epoch, f"loss: {loss}")
    return None
regnet = regression(input_dim=341)
optim = SGD(regnet.parameters(), lr=0.01)
fit(regnet, data_loader, optim=optim, epochs=5)
pred = regnet(torch.Tensor(test_set.data_info).float())
pred = pred.detach().numpy()

Hi Saida!

Because you are performing a regression with MSELoss, your
model should not be actively unstable. However, depending on
data, your model could be underdetermined – especially if you
have fewer samples than parameters – and therefore also might
not be actively stable.

I would suggest you add weight_decay to your SGD optimizer.
This is equivalent to adding an “L2” penalty to your loss function,
and should keep any underdetermined parameters from running
off to infinity.

Some side comments:

If you really have a classification problem, you should consider not
trying to approach it as a regression. Do your class labels really
represent something numerical, such as $0, $1, $2, where “1” is
numerically between “0” and “2” (and indeed, halfway between)?
Or are your labels just arbitrary class labels, where, for example,
“0” means “dog,” “1” means “cat,” and “2” means “horse?”

If your labels are not naturally numerical, you should treat your
problem as a classification problem. You would want to add more
layers to your model, separated by non-linear “activations,” and use,
most likely, CrossEntropyLoss as your loss function.

There is no reason to set requires_grad = True for X and y. Any
gradients that you calculate for X and y just get ignored. (X and y
haven’t been added to your optimizer – nor should they be.)

Also, Variable is deprecated – it doesn’t really do anything anymore.
(To set requires_grad = True for X – but you don’t want to – just
use X.requires_grad = True.)

Best.

K. Frank