Deep Regressor not improving over epochs

Aditya_Shukla · November 22, 2020, 2:27pm

I am trying to writing a regressor which uses a series of dense layers to predict a value.
The code is as follows:

The Network

class deep_regressor(nn.Module):
  def __init__(self):
    super(deep_regressor,self).__init__()
    self.linear_1 = nn.Linear(
                      in_features = 8,
                      out_features = 16
                    )
    self.linear_2 = nn.Linear(
                      in_features = 16,
                      out_features = 32
                    )
    self.output = nn.Linear(
                      in_features = 32,
                      out_features = 1
                    )
    
  def forward(self, input_tensor):
    tensor = self.linear_1(input_tensor)
    tensor = f.relu(tensor)

    tensor = self.linear_2(tensor)
    tensor = f.relu(tensor)

    tensor = self.output(tensor)
    return tensor

The Script:

neurons = deep_regressor()
optimizer = torch.optim.Adam(neurons.parameters(),lr = 0.01)
loss_function = nn.MSELoss()
for epoch in range(10):
    number = 0
    total_loss = 0 
    accuracy = 0
    for data_rows,labels in train_loader:
        predictions = neurons(data_rows)
        calc_loss = loss_function(predictions,labels)

        optimizer.zero_grad() #resets the gradients

        calc_loss.backward()
        optimizer.step()

    print(
        "Epochs =\t" + str(epoch + 1)
    )
    preds = neurons(torch.as_tensor(df, dtype = torch.float32))
    preds = preds.detach().numpy()
    rmse = math.sqrt(mean_squared_error(preds,target))
    print(rmse)

The output per epoch is as follows:

Epochs =	1
78.64161989373262
Epochs =	2
78.3528279726827
Epochs =	3
78.3800896760905
Epochs =	4
78.35181985605024
Epochs =	5
78.24301382034795
Epochs =	6
78.52033108135714
Epochs =	7
78.39965093292
Epochs =	8
78.2766145950348
Epochs =	9
78.39979733373761
Epochs =	10
78.39885957650147

Why is this happening? How do I fix it?

Henry_Chibueze · November 22, 2020, 2:46pm

Lemme explain sth to u
U see Neural networks are good and all but sometimes they don’t just fit better than other Machine learning models when it comes to certain things.
Neural networks are only known to outperform other models when it comes to image and text data u know y? Coz they are very good at feature extraction and pattern recognition.

Now u r performing regression task which probably means ur data is structured, in this case there are models that work better than any when it comes to structured data these models are called Gradient Boosters (eg: Gboost, XGboost, Random forest, etc)
How do they work?
Well they are an esemble of weak learner algorithms called the decision tree that come together to form a strong learning unit. This ensemble can be used to perform regression and classification tasks on structured data and can even give much better accuracies than other ML algorithms if u know what u are doing.

Text or Image data -> Neural networks.
Structured data -> Gradient Boosters.

Henry_Chibueze · November 22, 2020, 2:50pm

Although in ur neural network 3 layers might be over kill for a ordinary regression task.
Then again if u data have really deep underlying patterns that even u cannot figure out, then u can use more layers for it (just don’t forget to put a dropout layer so the network doesn’t overfit on training data and reach only local minimum instead of global minimum)
Also u can add non-linear transformation between ur hidden layer and output layer (nn.ReLU()).

Hope these answers help.

Aditya_Shukla · November 22, 2020, 3:25pm

I have already tried bagging and boosting. By your logic time series forecasting (which also happens to be a form of regression should not be done using neural network.

Here is an example related to standard regression: https://ieeexplore.ieee.org/document/5596936

I think they know what they are doing!!

Henry_Chibueze · November 22, 2020, 4:45pm

Actually there’s a special kinda Neural network algorithm for time series data there the Recurrent Neural Networks specifically the LSTM(long short term memory) and GRU(Gate Recurrent Unit)

The reason these are more preferable for time series data is coz RNNs are known for recognizing sequencial patterns and predicting the next sequence coz there store data of previous instances in the sequence that serve as the dependency for the next prediction.

Henry_Chibueze · November 22, 2020, 4:50pm

I’m not by any means trying to say that Neural networks can’t and shouldn’t be used for regression analysis.
What I’m saying is that the gradient booster ensemble works better on structured data and I’ve actually experienced it myself alot of times.

Aditya_Shukla · November 23, 2020, 5:03am

I absolutely agree with you. The problem is, in this case they are not able to identify the the pattern (Perhaps I should have mentioned this). Hence the overkill dense layers.

As for RNNs, I know about them, I am using them with CTC loss and decoder. I too dislike using overkill solution and go about flag questions on StackOverflow; but in this case I think they might be the solution.

Henry_Chibueze · November 23, 2020, 10:39am

I see
So what’s the context of ur data lemme see if I could come up with any suggestions