How do you structure a stateful LSTM with PyTorch?

Dear PyTorch community,

I apologise for the uncategorizing of this particular question as I could not find a forecasting category.

Basically, I would like to find out if anyone has ever tried to define a stateful LSTM model with PyTorch? If so, please do share your wisdom with the community as this has been an ongoing debate online (from what I observed in several forums). If there was a link that shows that the issue had been resolved already, please do feel free to share it!

Below is the code I have at the moment:

class MultivariateLSTM(nn.Module):
    
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(MultivariateLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size = input_size, hidden_size = hidden_size, num_layers = num_layers, batch_first = True, dropout = 0.2)
        self.fc1 = nn.Linear(hidden_size, 16)
        self.fc2 = nn.Linear(16, output_size)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        out, (h, c) = self.lstm(x)
        out = self.relu(out)
        out = self.fc1(out[:, -1, :])
        out = self.relu(out)
        out = self.fc2(out)
        
        return out

Could you also share how will the training loop look like (i.e., initialisation of hidden and cell states, passing the current (h, c) to the next loop, and etc.)?

I very much appreciate any help that comes by!

I think you can refer to this link here

  1. Confusion regarding PyTorch LSTMs compared to Keras stateful LSTM - #4 by deividbotina

Thank you for the link! I will test it and see if it works.

I tried with and without the passing of hidden and cell states, and it seemed that the results (MSE, MAE, etc.) had little to no change. It seems that the LSTM unit itself handles this in the shadows without explicitly showing its operations with these terms.