Hi!

I’m currently developing a multi-step time series forecasting model by using a GRU (or also a bidirectional GRU). The idea is to use this model to infer the temperature of the next 2 months given the previous three (I have the daily temperature starting from 1995 till 2020 → dataset).

However, while doing training the loss after the first epoch, get stuck and neither decrease nor increase for all the remaining epochs.

```
EPOCH 1:
Train Loss : 0.548
EPOCH 2:
Train Loss : 0.548
EPOCH 3:
Train Loss : 0.548
EPOCH 4:
Train Loss : 0.548
EPOCH 5:
Train Loss : 0.548
...
```

Since I was not able to find any error I developed the same application in Keras, using the same hyperparameters, the same input preprocessing (including normalization) the same loss (L1Loss or MAE), and the same n.of layers. The only thing that changes is that in PyTorch I used a GRU while in Keras an LSTM with a ‘ReLU’ activation function in each LSTM cell. However, everything goes well in Keras and the loss decreases normally.

```
Epoch 1/50
284/284 [==============================] - 35s 114ms/step - loss: 0.1528
Epoch 2/50
284/284 [==============================] - 31s 110ms/step - loss: 0.0922
Epoch 3/50
284/284 [==============================] - 33s 115ms/step - loss: 0.0845
Epoch 4/50
284/284 [==============================] - 30s 107ms/step - loss: 0.0740
Epoch 5/50
284/284 [==============================] - 32s 114ms/step - loss: 0.0709
```

As follows I report the GRU in Pytorch, followed by the one in Keras.

```
class GRU_net(nn.Module):
def __init__(self,hidden_dim, num_layers, output_size, drop_prob=0.0):
super(GRU_net, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.output_size = output_size
#GRU
self.gru = nn.GRU(input_size = 1,hidden_size=hidden_dim, num_layers = num_layers, bidirectional=False, batch_first=True, dropout=drop_prob)
#fully connected layers
self.fc = nn.Linear(hidden_dim*1,output_size)
def init_hidden(self, batch_size):
hidden = torch.zeros(self.num_layers*1, batch_size, self.hidden_dim, device=torch.device(device))
return hidden
def forward(self,x):
batch_size = x.size(0)
hidden = self.init_hidden(batch_size)
gru_out, h = self.gru(x, hidden)
gru_out = gru_out[:, -1, :]
out = self.fc(gru_out)
return out
hidden_dim = 64
num_layers = 2
output_size = n_steps_out
gru_net = GRU_net(hidden_dim, num_layers, output_size).to(device)
```

Keras:

```
model = Sequential()
model.add(LSTM(64, activation='relu', return_sequences=True, input_shape=(n_steps_in,n_features)))
model.add(LSTM(64, activation='relu'))
model.add(Dense(n_steps_out))
```

Furthermore, I checked each shape coming out from each layer and they are the same between the Pytorch and the Keras implementation.

Thank you in advance for the help.