How to properly implement LSTM layer

I am having difficulty getting my LSTM based model to train properly, the loss stays fairly high. I have an implementation of the same model in Keras and I’m trying to convert it to PyTorch. In Keras, my LSTM layer looks like this (as a Sequential model):


and in PyTorch:

self.lstm = nn.LSTM( input_size = 1,hidden_size = 36,num_layers = 1)

In Keras, the default is not to return the hidden state, so in PyTorch I’m not sure how to handle that in the forward function, or when to initialize the hidden states. Right now I’m using this to initialize the hidden state:

def init_hidden(self):
        return (
            torch.zeros(1, 1, 36, device='cuda'),
            torch.zeros(1, 1, 36, device='cuda'),

And I call this at the beginning of the forward() function. When I use the lstm layer it looks like this:

out, self.hidden = self.lstm(out, self.hidden)

So maybe the hidden states are being initialized too often? I’m using a batch_size of 4096 if that matters.

Thank you!