Time series forecasting model not training properly

I am learning and trying to create a time series forecasting model using LSTMs. I am using the data from https://github.com/neelabhpant/Deep-Learning-in-Python/blob/master/USD_INR.csv. This is the code I have written so far.

import torch
from torch import nn
from torch.utils import data as torch_data
import torch.nn.functional as F
import pandas as pd

class PandasReader(torch_data.Dataset):
    def __init__(self, data, target):
        self.data = data
        self.target = target
        self.shape = data.shape

    def __getitem__(self, ix):
        row = self.data.iloc[ix]
        features = torch.tensor(row.drop(self.target).values)
        target = torch.tensor(row[[self.target]].values)
        return features.to(device), target.to(device)

    def __len__(self):
        return self.data.shape[0]

class RNN(nn.Module):
    def __init__(self, input_dim):
        super(RNN, self).__init__()
        self.lstm_1 = nn.LSTM(
            input_size=input_dim, hidden_size=64, num_layers=2, batch_first=True
        self.fc_1 = nn.Linear(in_features=64, out_features=16)
        self.fc_2 = nn.Linear(in_features=16, out_features=1)
        self.dropout = nn.Dropout()
        self.hidden = None

    def forward(self, x):
        y, self.hidden = self.lstm_1(x, self.hidden)
        y = self.dropout(y)
        y = self.dropout(F.leaky_relu(self.fc_1(y)))
        y = self.dropout(F.leaky_relu(self.fc_2(y)))
        return y

def train(train_loader, val_loader, epochs):
    model = RNN(train_loader.dataset.shape[1] - 1).to(device)

    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    for epoch in range(epochs):
        train_loss = 0.0
        val_loss = 0.0
        for _, (features, target) in enumerate(train_loader):
            x = features.unsqueeze(dim=0)
            y = model.forward(x).squeeze(dim=0)
            loss = criterion(y, target)
            train_loss += loss.item()
        train_loss = train_loss / len(train_loader)

        with torch.no_grad():
            for _, (features, target) in enumerate(val_loader):
                x = features.unsqueeze(dim=0)
                preds = model.forward(x).squeeze(dim=0)
                loss = criterion(preds, target)
                val_loss += loss.item()
            val_loss = val_loss / len(val_loader)

        print(f"EPOCH {epoch+1}: TRAIN LOSS - {train_loss}, VAL LOSS - {val_loss}")

    return model

if __name__ == "__main__":
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("USING DEVICE:", device)


    df_train = pd.read_csv("./train.csv").drop("Date", axis="columns")
    train_loader = torch_data.DataLoader(
        PandasReader(df_train, target="Price"), batch_size=30, shuffle=True

    df_val = pd.read_csv("./val.csv").drop("Date", axis="columns")
    val_loader = torch_data.DataLoader(
        PandasReader(df_val, target="Price"), batch_size=30, shuffle=True

    print("TRAINING DATA SHAPE:", train_loader.dataset.shape)
    print("VALIDATION DATA SHAPE:", val_loader.dataset.shape)

    model = train(train_loader, val_loader, epochs=50)

The loss decreases for the first few epochs, and then it keeps bouncing up and down. I have a few questions,

  1. Do I need to shift the input data whenever trying to do time series forecasting? In other words, do I need to manually modify the data so that the previous day’s data is input to the model for the current day?

  2. I kept getting error Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.. I added retain_graph=True to loss.backward(), and then it worked fine. When do I use retain_graph=True? Is there any rule-of-thumb or something for it? I read somewhere on the forums that I should set it to False for the last iteration.

  3. Right now, it’s a stateful LSTM… and in the forward function, if I do self.hidden=None as the first step, then it becomes a stateless LSTM, right?

I would appreciate any corrections and suggestions for my code.

EDIT 1: The validation set is the data of year 2017 from the linked CSV file.

Two quick comments from a quick look (it might not be all):

  • You shuffle your datapoints for batching, that probably mixes badly with the statefulness you mention.
  • The retain_graph thing you mention also comes from the statefulness. You could solve that using “detach”, but then you’d not get back propagation through timesteps (or any signal for the hidden hidden connections).

What you probably want to do is to feed several timesteps into your RNN and then extract one prediction.
I would recommend to look at the Time series prediction example or look at how fast.ai do that in their language modelling (keep in mind that fast.ai’s intention is to provide a efficient access to their excellent ideas rather than be a showcase of PyTorch coding style).

Best regards


Right. Shuffling is definitely bad here. I completely forgot about that.

I had seen that example earlier. I was trying to use LSTM instead of LSTMCell. The batch_first thing is still confusing me a bit. Am I doing it correctly? By calling .unsqueeze(dim=0) when passing the input data, and calling .squeeze(dim=0) on the output.

Is there a better doc which I can read for understanding when to use retain_graph? Or could you please explain it in brief when to use it?

Best regards,

Thomas’ rule of thumb for when to use retain_grad: use it within one training step if you must, but not across steps.
If that doesn’t work for you, you’re either doing something beyond my wit or not quite doing it right yet.

Best regards


It’s going to be “not doing it right” for a while :sweat_smile:. Thanks a lot!