LSTM model not learning and updating weights

I’m working on a time series forecasting project and encountering issues with my model learning. Based on the weights’ progression in the screenshot (each entry represents an epoch), the weights seem to be stagnating.

This is my model:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers)
        self.fc_1 = nn.Linear(hidden_size, 128)
        self.relu = nn.ReLU()
        self.fc = nn.Linear(128, 5)

    def forward(self, x):
        out, (hidden, cell) = self.lstm(x)
        out = out[:, -1, :]
        out = self.fc_1(out)
        out = self.relu(out)
        return self.fc(out)

model = LSTMModel(6, 64, 25)
loss_function = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

model.train()
for x_batch, y_batch in train_data_loader:
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        output = model(x_batch)
        loss = loss_function(output, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Is there any feedback I can get onto why the Model seems to not be learning and weights not being updated?

The line

self.lstm = nn.LSTM(input_size, hidden_size, num_layers)

implies that you use the default value of batch_first=False so the expected input shape is (seq_len, batch_size, input_size) and the shape of out after

out, (hidden, cell) = self.lstm(x)

will be (seq_len, batch_size, hidden_size) (since you done use a bidirectional LSTM).

This now means the line

out = out[:, -1, :]

will yield an out with shape (seq_len, hidden_size). What you probably want is a shape of (batch_size, hidden_size). Therefore you need

out = out[-1]

I assume that your shape of x_batch is actually already of shape (batch_size, seq_len, input_size) – i.e., not the expected shape given that you initialized the nn.LSTM with batch_first=False (default). If that is indeed the case, the simplest solution should be to change the code to

self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)

Note that in this case the line has to remain

out = out[:, -1, :]

Thanks @vdw

I did the changes you suggested and I still see the weights pretty much stagnant.

I would expect to see more variation however the weights seem to remain in 0.

UPDATE:
I added a print statement:

        loss.backward()
        print(model.lstm.weight_ih_l0.grad[0][0])

and seems the weights become very small that vanish to zero and after that they stay there:

tensor(-2.2441e-27, device='cuda:0')
tensor(2.1074e-29, device='cuda:0')
tensor(6.9026e-28, device='cuda:0')
tensor(-4.6863e-29, device='cuda:0')
tensor(-3.3049e-28, device='cuda:0')
tensor(-1.1580e-29, device='cuda:0')
tensor(3.9923e-30, device='cuda:0')
tensor(1.4204e-27, device='cuda:0')
tensor(2.1197e-30, device='cuda:0')
tensor(-2.6489e-29, device='cuda:0')
tensor(-3.7362e-30, device='cuda:0')
tensor(1.1758e-30, device='cuda:0')
tensor(6.1142e-29, device='cuda:0')
tensor(-1.5144e-30, device='cuda:0')
tensor(2.3485e-31, device='cuda:0')
tensor(4.2870e-33, device='cuda:0')
tensor(-5.3474e-33, device='cuda:0')
tensor(-2.6197e-29, device='cuda:0')
tensor(-9.1210e-32, device='cuda:0')
tensor(5.1670e-28, device='cuda:0')
tensor(6.1556e-31, device='cuda:0')
tensor(-8.9021e-29, device='cuda:0')
tensor(-2.2017e-28, device='cuda:0')
tensor(-6.5199e-33, device='cuda:0')
tensor(4.5335e-31, device='cuda:0')
tensor(-4.3888e-32, device='cuda:0')
tensor(-1.5471e-32, device='cuda:0')
tensor(4.2088e-30, device='cuda:0')
tensor(-1.1161e-31, device='cuda:0')
tensor(9.0165e-31, device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')