Slient bugs in Pytorch replication of a simple LSTM model built with Keras

I am new to Pytorch. I am trying to replicate a simple Keras LSTM model in Pytorch. Two model takes in the exact same data but the Pytorch implementation produces a significantly worse result.

In my toy project, I am doing time series prediction with Google stock price. Using past 60-day prices to predict next Open price.

Complete code available in Kaggle kernel: https://www.kaggle.com/garfieldchh/buggy-pytorch-model

Groud Truth(Google stock price in 2017 Jan)

I would like to know what mistakes/misconfiguration did I make in my Pytorch implementation and Why I am not able to reproduce the keras model in my pytorch code . Thanks for the help !

regressor = Sequential()

regressor.add(LSTM(units=50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = "adam", loss = 'mean_squared_error')

regressor.summary()
regressor.fit(X_train,y_train, epochs=100, batch_size = 32)

Pytorch model and training loop:

def my_training_loop(m, dl, epochs):
    opt = optim.Adam(m.parameters())
    crit = nn.MSELoss()

    for epoch in range(epochs):
        accu_loss = 0
        batch_count = 0
        for i, (train_x, train_y) in enumerate(dl):

            x = Variable(train_x.cuda())
            y = train_y.cuda()
            opt.zero_grad()
            preds = m(x)
            loss = crit(preds, y)
            accu_loss += loss.item()
            batch_count += 1
            loss.backward()

            opt.step()
        print(f'Epoch: {epoch}. Loss: {accu_loss/batch_count}')
class MyLSTM(nn.Module):

    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):

        super().__init__()
        self.layer_dim = layer_dim
        self.hidden_dim = hidden_dim
        self.lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim,\
                          num_layers=layer_dim,bias=True, batch_first = True,dropout=0.2)
        self.dropout = nn.Dropout(p=0.2)
        self.fc = nn.Linear(in_features=hidden_dim, out_features=output_dim)

    def forward(self, x):
        h0 = Variable(torch.zeros((self.layer_dim, x.size(0), self.hidden_dim)).cuda())
        c0 = Variable(torch.zeros((self.layer_dim, x.size(0), self.hidden_dim)).cuda())

        o, h = self.lstm(x, (h0,c0))
        o = self.fc(self.dropout(o[:,-1,:]))
        return o 

Since I can’t post more than one image … I will add them here

Keras model prediction (the trend has been captured):

Pytorch Model prediction (nothing has been learned):