PyTorch MLP underfitting

Attila_Kovacs · February 12, 2023, 2:42pm

Hi guys!
I’m new to PyTorch and I’m trying to implement a MLP regressor but I cannot get the model to fit the data (boston hosuing prices dataset), it underfits badly.
I’ve tried the sklearn MLPRegressor and that worked fine but my implementation just doesn’t seem to work for some reason, I’ve tried incresing the complexity of the model, tried different hyper parameters but the limit I could go down to is around 3300 as sum of absolute errors.
The skelarn model went down all the way to 20.

Here is the sklearn model code:

import torch
from torch import nn
from torch.utils.data import DataLoader
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
import numpy as np


  
if __name__ == '__main__':
  torch.manual_seed(42)
  
  X, y = load_boston(return_X_y=True)

  print(type(X), type(y))
  
  X = StandardScaler().fit_transform(X)


  print(X)

  model = MLPRegressor(
      hidden_layer_sizes=(256, 128, 64, 32), 
      # activation="relu",
      # solver="adam",
      warm_start = True,
      max_iter=50000,
      learning_rate_init=0.0001,
      solver='adam',
      learning_rate='constant',
      tol = 1e-8,
      n_iter_no_change = 100,
      verbose=True
      )

  model.fit(X, y)

  outputs = model.predict(X)
  targets = y
  
  print('Error : ', np.sum(np.abs(outputs-targets)))

  print('Training process has finished.')

Here is the PyTorch model code:

import torch
from torch import nn
from torch.utils.data import DataLoader
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler

      
class MLP(nn.Module):
  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Linear(13, 256),
      nn.ReLU(),
      nn.Linear(256, 128),
      nn.ReLU(),
      nn.Linear(128, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, 1)
    )


  def forward(self, x):
    return self.layers(x)

  
if __name__ == '__main__':
  torch.manual_seed(42)
  
  X, y = load_boston(return_X_y=True)

  X = StandardScaler().fit_transform(X)
  X = torch.tensor(X, dtype=torch.float32)
  y = torch.tensor(y, dtype=torch.float32)

  inputs = X
  targets = y
  
  mlp = MLP()
  
  loss_function = nn.MSELoss()
  optimizer = torch.optim.Adam(mlp.parameters(), lr=1e-4)

  print_cycle = 100

  num_epochs = 10000

  for epoch in range(0, num_epochs): 
    
    optimizer.zero_grad()
    
    outputs = mlp(inputs)
    
    loss = loss_function(outputs, targets)
    
    loss.backward()
    
    optimizer.step()
    
    if epoch % print_cycle == 0:
      print('Sum of errors:', torch.sum(torch.abs(outputs.squeeze().T-targets)).item(), 'Epoch : ', epoch, 'out of : ', num_epochs)

  print('Training process has finished.')

Thank you in advance!

eqy · February 12, 2023, 10:10pm

One thing that looks a little strange to me is that your sum of errors calculation is taking the transpose of the output but this doesn’t appear to be done before passing it to the loss function. Could you check if the loss is decreasing in spite of the error remaining high? If so I would check that the loss function is being used in a way that is comparable to the error calculation.

Attila_Kovacs · February 13, 2023, 9:05am

Hey man!
You’re right, that was the problem, giving the squeezed and transposed form of the predicted values to the loss function solved it!
Thank you!