Trying to do a linear regression with multiple inputs and one output

I am trying to create a linear regression model that predicts injury time in a football match, however my results are quite bad:
Error on validation data
MSE: 1.1520577669143677
MAE: 0.7984767556190491
MAPE: 35.94094467163086 %
Error on test data
MSE: 1.2277499437332153
MAE: 0.8027499914169312
MAPE: 41.30732345581055 %.

Can somebody help me with my code to improve my results?

import matplotlib.pyplot as plt

import pandas as pd

import torch

from import DataLoader

from math import sqrt

We set a fixed seed for repeatability

random_seed = 12345 # This seed is also used in the pandas sample() method below


df = pd.read_csv(‘data/injuryTimeDataset.csv’, index_col=0)


#Split data into a train, validation and test set

#Test set

test_set = df.iloc[8000:12000]

#Make a cope of the dataset and remove test set

train_val_set = df.copy().drop(test_set.index)

#Random sample validation data without replacement(10%)

val_set = train_val_set.sample(frac=0.1, replace = False, random_state = random_seed)

#Remaining data used for training (90%)

train_set = train_val_set.copy().drop(val_set.index)

#Check numbers add up

n_points = len(train_set) + len(val_set) + len(test_set)

#print(f’{len(df)} = {len(train_set)} + {len(val_set)} + {len(test_set)} = {n_points}’)

#Plot the sets

plt.figure(figsize=(16, 9))

plt.scatter(train_set.index, train_set[‘declared_inj_time’], color=‘black’, label=‘Train’)

plt.scatter(val_set.index, val_set[‘declared_inj_time’], color=‘green’, label=‘Val’)

plt.scatter(test_set.index, test_set[‘declared_inj_time’], color=‘red’, label=‘Test’)


#Inputs and outputs

INPUT_COLS = [‘goals’,‘corners’, ‘free_kicks’,‘substitutions’]

OUTPUT_COL = [‘declared_inj_time’]

#Linear regression model

class LinearRegression(torch.nn.Module):

def __init__(self):


    self.device = 'cuda' if torch.cuda.is_available() else 'cpu'

    self.linear = torch.nn.Linear(in_features=len(INPUT_COLS), out_features=len(OUTPUT_COL))

def forward(self, x):

    y_pred = self.linear(x)

    return y_pred

#Training loop

def train(

model: torch.nn.Module,

train_loader: DataLoader,

val_loader: DataLoader,

n_epochs: int,

lr: float,

) -> torch.nn.Module:

#Loss and optimizer

criterion = torch.nn.MSELoss(reduction='mean')

optimizer = torch.optim.Adam(model.parameters(),lr=lr)

#Train weigths

for epoch in range(n_epochs):

    for inputs, labels in train_loader:

        #Zero the parameter gradients


        #Forward propagation

        pred_y = model(inputs)


        #Compute loss

        loss = criterion(pred_y,labels)

        #Backward propagration to compute gradient


        #Update parameters


        # Evaluate model on validation data

    mse_val = 0

    for inputs, labels in val_loader:

        mse_val += torch.sum(torch.pow(labels - model(inputs), 2)).item()

    mse_val /= len(val_loader.dataset)

    print(f'Epoch: {epoch + 1}: Val MSE: {mse_val}')

return model

#Prepare data for training

x_train = torch.from_numpy(train_set[INPUT_COLS].values).to(torch.float)

y_train = torch.from_numpy(train_set[OUTPUT_COL].values).to(torch.float)

x_val = torch.from_numpy(val_set[INPUT_COLS].values).to(torch.float)

y_val = torch.from_numpy(val_set[OUTPUT_COL].values).to(torch.float)

#Create dataset loaders

train_dataset =, y_train)

train_loader =, batch_size=10,shuffle=True)

val_dataset =,y_val)

val_loader =, batch_size=len(val_set),shuffle=False)

#Initialize mmodel

model = LinearRegression()

#Train model

n_epochs = 100

lr = 0.0001

model = train(model, train_loader, val_loader, n_epochs, lr)

#Evaluate model

Predict on validation data

pred_val = model(x_val)

Compute MSE, MAE and MAPE on validation data

print(‘Error on validation data’)

mse_val = torch.mean(torch.pow(pred_val - y_val, 2))

print(f’MSE: {mse_val.item()}’)

mae_val = torch.mean(torch.abs(pred_val - y_val))

print(f’MAE: {mae_val.item()}’)

mape_val = 100*torch.mean(torch.abs(torch.div(pred_val - y_val, y_val)))

print(f’MAPE: {mape_val.item()} %’)

#Evaluate model on test data

Get input and output as torch tensors

x_test = torch.from_numpy(test_set[INPUT_COLS].values).to(torch.float)

y_test = torch.from_numpy(test_set[OUTPUT_COL].values).to(torch.float)

Make prediction

pred_test = model(x_test)

Compute MSE, MAE and MAPE on test data

print(‘Error on test data’)

mse_test = torch.mean(torch.pow(pred_test - y_test, 2))

print(f’MSE: {mse_test.item()}’)

mae_test = torch.mean(torch.abs(pred_test - y_test))

print(f’MAE: {mae_test.item()}’)

mape_test = 100*torch.mean(torch.abs(torch.div(pred_test - y_test, y_test)))

print(f’MAPE: {mape_test.item()} %’)

Your target variables are conceptually discrete, so gradient descent for MSE loss struggles, as it implicitly fits means of gaussian distributions. Poisson regression (see PoissonNLLLoss) or reformulating the task as classification should work better.

Note that you could use something like LinearRegression from sklearn, that returns a closed form solution without doing mini-batch gradient descent.

Thank you very much!
If I should reformulate the task as a classification task, how would this look? The goal of my task is to get a measure of how much each event affects injurytime.

Would a neural network give a better estimate?

Also, are there any non-gaussian response loss function that I could use?

I did change the criterion to PoissonNLLLoss and the results did not improve, any other tips?:
Error on validation data
MSE: 4.964473247528076
MAE: 1.8417545557022095
MAPE: 61.19207763671875 %
Error on test data
MSE: 4.665085792541504
MAE: 1.7789844274520874
MAPE: 60.499691009521484 %

poisson loss fits log(input) to a poisson rate, so exp(input) should be used to recover [positive] rates. And I’m not sure how good these metrics are with a poisson distribution.

For that goal, maybe it is not that good, as per category coefficients would be introduced.

Without hidden layers, it is worse than a least squares minimizer, as the latter instantly gets the global minimum. (and I don’t think a hidden layer will help in your case, as your feature space is too small).

It is actually possible that your initial result converged and the fit line is as good as you can get (with weak predictors). Big MSE is to be expected with separated targets.