Validation Loss is not decreasing in my RNN model

I am currently working on training an RNN to map sequential data to real-valued positive values, but unfortunately my validation loss is not decreasing. I have tried increasing the number of layers in my RNN and adding DropOut to my model, but no success. Some possible reasons why I think this happens:

  • I am incorrectly using WeightedRandomSampler. I have an imbalanced representation of observations where a set of variable sized sequences, X, with associated response y are more represented than others. y is a tensor containing positive integers, and some of these integers are more represented than others in the overall dataset. I think I’m conflating the regression task with classification strategies, which may motivate the model to not learn patterns in the dataset, but I’m not sure. Would the default sampler be the preferred strategy for this task?

  • The gradients are not properly being updated in the training process. Part of my strategy with this mapping from sequential data to a real-value response is to approach my modeling by using the RNN to transform sequential data into a sequence of nonnegative real values, and then summing these values to generate the final model prediction. However, when computing the loss and calling loss.backward(), I’m not sure if the gradients are properly being updated. It’s difficult for me to visualize how to compute the gradients under this approach, and unfortunately, I am unable to find working solutions online that successfully train the model to learn the mapping I am interested in developing. Some sample code:

import torch
import torch.nn as nn

batch_size = 4
sequence_len = 10
num_features = 8
num_outputs = 1

model = nn.GRU(num_features, num_outputs, batch_first = True)
criterion = torch.nn.MSELoss()

x_test = torch.randn(batch_size, sequence_len, num_features)
y_test = torch.randn(batch_size, num_outputs)

model_output, hidden = model(x_test)
y_pred = torch.sum(model_output, dim=1)
loss = criterion(y_pred, y_test)


In this case, is this the correct way to properly update the gradients in the model? If not, what is the recommendation here?

I can provide a Colab notebook in case if anyone wants to view my development.