LSTM not learning in pytorch but working in keras

I am facing a weird problem, let me throw in the context first.
I am developing a time series forecasting model using LSTM, basically many to one stacked LSTM. so i used keras first for fast testing, below is my keras model

model = Sequential()
    model.add(LSTM(100, activation="relu", input_shape=input_shape, return_sequences=True))
    model.add(LSTM(50, activation="relu"))
    model.compile(optimizer=Adam(lr=0.005), loss='mse')

When i trained this model with above setting i got loss of 0.0035 on my normalized data.

So for more flexiblity i tried using pytorch and made this below model

class Generator(nn.Module):

    def __init__(self, input_dim, output_dim=1, batch_size=1):
        super(Generator, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.batch_size = batch_size
        self.hidden_size = 50

        self.lstm_l1 = nn.LSTM(100, input_size=self.input_dim, bidirectional=False, num_layers=1)
        self.lstm_l2 = nn.LSTM(hidden_size=50, input_size=100, bidirectional=False)
        self.fc1 = nn.Linear(50, self.output_dim)

    def forward(self, x):

        x, (h1, c1) = self.lstm_l1(x)
        x, (h2, c2) = self.lstm_l2(x, (h1, c1))
        x = self.fc1(x[-1])  # because i want one output only per input, so picking latest time output
        return x

I used same settings for training for above model learning rate and loss function and it also gave me around same loss value 0.0031 BUT when i compared the output and plotted a curve , the learned curve was almost straight line but whereas the keras model was able to learning properly.
Can someone please tell me what am i doing wrong.
I reshaped the data into proper shape as required by pytorch lstm

Input shape for keras: (292, 20, 1) # batch size, sequence length, num features
Input shape for pytorch: (20, 292, 1) . # as required by default lstm
Output shape by pytorch lstm: (292, 1) . #means for every 20 step i want 1 step further prediction

PS: i tried changing learning rate , hidden units, layers, BatchNorm for lstm but it remains around the same for pytorch.

Please someone guide me.

Can you please give some insight on this ?