# NaN loss with linear regression

Hello

I have classification problem. My input is sequence of length 341 and output one of three classes {0,1,2}, I want to train linear regression model using Pytorch, I have the following class but during the training, the loss values start to have numbers then inf then NAN. I do not know how to fix that . Also I tried to initialize the weights for linear model but it is the same thing. Also I tried different epochs 5, 20 ,100 all are same . Any suggestions.

``````class regression(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.input_dim = input_dim
# One layer
self.linear = nn.Linear(input_dim, 1)

def forward(self, x):
y_pred = self.linear(x)
return y_pred

criterion = torch.nn.MSELoss()

for epoch in range(epochs):

for i, (X, y) in enumerate(data_loader):
X = X.float()
y = y.unsqueeze(1).float()
# Make a prediction for the input X
pred = model(X)
#loss = (y-pred).pow(2).mean()
loss = criterion(y, pred)
loss.backward()
optim.step()
print(loss)
print(type(loss))
# Give some feedback after each 5th pass through the data
if epoch % 5 == 0:
print("Epoch", epoch, f"loss: {loss}")
return None
regnet = regression(input_dim=341)
optim = SGD(regnet.parameters(), lr=0.01)
pred = regnet(torch.Tensor(test_set.data_info).float())
pred = pred.detach().numpy()
``````

Hi Saida!

Because you are performing a regression with `MSELoss`, your
model should not be actively unstable. However, depending on
data, your model could be underdetermined – especially if you
have fewer samples than parameters – and therefore also might
not be actively stable.

I would suggest you add `weight_decay` to your `SGD` optimizer.
This is equivalent to adding an “L2” penalty to your loss function,
and should keep any underdetermined parameters from running
off to infinity.

If you really have a classification problem, you should consider not
trying to approach it as a regression. Do your class labels really
represent something numerical, such as \$0, \$1, \$2, where “1” is
numerically between “0” and “2” (and indeed, halfway between)?
Or are your labels just arbitrary class labels, where, for example,
“0” means “dog,” “1” means “cat,” and “2” means “horse?”

If your labels are not naturally numerical, you should treat your
problem as a classification problem. You would want to add more
layers to your model, separated by non-linear “activations,” and use,
most likely, `CrossEntropyLoss` as your loss function.

There is no reason to set `requires_grad = True` for `X` and `y`. Any
gradients that you calculate for `X` and `y` just get ignored. (`X` and `y`
Also, `Variable` is deprecated – it doesn’t really do anything anymore.
(To set `requires_grad = True` for `X` – but you don’t want to – just
use `X.requires_grad = True`.)