Why does train loss decrease to zero while the validation loss increase to infinity in skorch

I’m using skorch for a regression problem. I find it great and I’m using it because it implements cross validation automatically. My project should be simple I have 3 features and I’m predicting one single continuous value. My problem is that whatever I do and even using regularisation with weight_decay, my validation loss increase rapidly while my training loss is near zero. I know that my model is overfiting but how can I prevent this ? I even tried batch normalization but it also didn’t work. any thoughts ?? I would apreciate it if someone guide me on this. Thanks in advance

Are you trying to solve your problem using neural nets?
If so, can you describe the network that you are using?

@mailcorahul yes I’m using neural nets, nothing complicated it should be an easy task to perform with NNs, this is the declaration of my nn:

class BearingNetwork(nn.Module):

def __init__(self, n_features, n_out):
    super().__init__()
    self.model = nn.Sequential(

        nn.Linear(n_features, 10),
        nn.LeakyReLU(),
        nn.Linear(10, 8),
        nn.LeakyReLU(),
        nn.Linear(8, 5),
        nn.LeakyReLU(),
        nn.Linear(10, n_out)
    )

def forward(self, x):
    out = self.model(x)
    return out

and then I’m wrapping it in a skorch NeuralNetRegressor class:

def train():

mse = EpochScoring(scoring=mean_squared_error, lower_is_better=True, name='MSE')
r2 = EpochScoring(scoring=r2_score, lower_is_better=False, name='R2')

coarse_checkpoint = Checkpoint(dirname='results/bearing/coarse')
coarse_train_end_checkpoint = TrainEndCheckpoint(dirname='results/bearing/coarse')

bearing_coarse_model = NeuralNetRegressor(
    module=BearingCoarseNetwork(n_features=X_norm.shape[1],  # X_norm is my training data (features) normalized
                            
                                n_out=y_bearing.shape[1]),  # y_bearing is my target which I didn't normalized since I'm predicting a continuous value
    device=device,
    batch_size=64,
    lr=0.01,
    optimizer=optim.Adam,
    optimizer__weight_decay=0.01,
    max_epochs=1000,
    train_split=predefined_split(testset),  # testset predefined for validation, I splited it to be 20% from my data which are 20% from 10000 data points that I have in the dataset
    callbacks=[mse, r2, coarse_checkpoint, coarse_train_end_checkpoint]
)

print(f"{'*'*10} start training the coarse model {'*'*10}")
coarse_model_history = bearing_coarse_model.fit(X_norm, y=y_bearing)
save_model(model=bearing_coarse_model,
           path='results/bearing/coarse/model/coarse_model.pkl')

print(f"{'*'*10} End Training the Coarse Model {'*'*10}")

if name == ‘main’:
train()

Given the number of input features=3 and your model is overfitting, can you try with a shallower network(single or two hidden layer)?

@mailcorahul yes I tried that and also with different hyperparams, optimizers, activations etc… I even used batch normalization but that all didn’t work