Loss not converging

vipin14119 · February 5, 2019, 10:10am

I want to create a model that can give me real value in output.

My input is nx3 and output is in the range -40 to -140

that is my model

class Regressor(nn.Module):
    def __init__(self):
        super(Regressor, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size_1)
        self.fc2 = nn.Linear(hidden_size_1, hidden_size_2)
        self.fc3 = nn.Linear(hidden_size_2, hidden_size_3)
        self.fc4 = nn.Linear(hidden_size_3, 1)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

I am using Adam Optimizer
opt = optim.Adam(model3.parameters(), lr=0.001)

Training sequence

epoch_data = []
for epoch in range(10000):
    avg_acc_test = 0
    avg_acc_train = 0
    avg_loss_train = 0
    avg_loss_test = 0
    for i in range(X_train.shape[0]):
        data_x = X_train_tensor[i]
        data_y = y_train_tensor[i]
        
        pred = model3(data_x)
#         print(pred[:5])
        loss = mse_loss(pred, data_y)
        
        opt.zero_grad()
        loss.backward()
        opt.step()
        
        pred_test = model3(X_test_tensor[0])
        loss_test = mse_loss(pred_test, y_test_tensor[0])
        avg_loss_train += loss.data
        avg_loss_test += loss_test.data

    if epoch % 50 == 0:    
        print('loss test {} loss train {}'.format(avg_loss_test/X_test.shape[0], avg_loss_train/X_train.shape[0] ))

Now the problem is my loss is not converging it always get stuck around 176 and i tried many values of learning rate , different number of layers and different activation functions as well and different number of nodes as well, still it revolves around 176 , and yes i normalised the input data (not the output data)

What should i do please help

ptrblck · February 5, 2019, 10:58am

You might try to normalize the output as well (and denormalize it for the validation/test case).
Also, did you manage to overfit your model on the training data?
I’m not sure which dataset you are using, but you could also try to scale down the model a bit and start with a single hidden layer for the beginning.

vipin14119 · February 5, 2019, 11:04am

@ptrblck
Hey,

Thanks for the reply.

Yes i did tried with single hidden layer with only 5-10 neurons as well, but also then the loss was around 179 , and all of my predictions was almost same, while data is quite dispersed.
The data i am using is private.
But the values in the data are quite random.

And no i was not able to overfit the model.

ptrblck · February 5, 2019, 12:31pm

How large is your dataset? Could you try to use a small sample (e.g. just 10 samples) and try to overfit it first?
If that doesn’t work at all, you could have a bug in your code somewhere or the model architecture might not be suitable for the data.

vipin14119 · February 5, 2019, 4:33pm

@ptrblck
Dataset contains more than 20 million rows
Ok i will try to overfit it and will let you know.

vipin14119 · February 7, 2019, 11:26am

@ptrblck

Hey, your advice to normalize the output data worked , now the predictions are somewhat sense full.
Just curious is it necessary to normalize the output data when working with regression ?

ptrblck · February 7, 2019, 11:36am

Not necessarily, but normalization might help, e.g. if your output value range is quite large.
Your model might have a hard time to push the parameters to higher values which would be necessary for a high range output.

raghadalghonaim · June 24, 2020, 8:59pm

Hello Sir,

Sorry if this sounds dumb, but if I did normalise my output values how will I predict the test data? will it be normalised as well? how to recover the actual values?

ptrblck · June 25, 2020, 4:27am

If you normalize the output during training (and the target of course, too), you might “denormalize” both to get the expected prediction ranges.
E.g. if you’ve standardized them by subtracting the mean and dividing with the stddev:

x_norm = (x - mean) / std

you might undo it via:

x = x_norm * std + mean