I’m currently developing a multi-step time series forecasting model by using a GRU (or also a bidirectional GRU). The idea is to use this model to infer the temperature of the next 2 months given the previous three (I have the daily temperature starting from 1995 till 2020 → dataset).
However, while doing training the loss after the first epoch, get stuck and neither decrease nor increase for all the remaining epochs.
EPOCH 1: Train Loss : 0.548 EPOCH 2: Train Loss : 0.548 EPOCH 3: Train Loss : 0.548 EPOCH 4: Train Loss : 0.548 EPOCH 5: Train Loss : 0.548 ...
Since I was not able to find any error I developed the same application in Keras, using the same hyperparameters, the same input preprocessing (including normalization) the same loss (L1Loss or MAE), and the same n.of layers. The only thing that changes is that in PyTorch I used a GRU while in Keras an LSTM with a ‘ReLU’ activation function in each LSTM cell. However, everything goes well in Keras and the loss decreases normally.
Epoch 1/50 284/284 [==============================] - 35s 114ms/step - loss: 0.1528 Epoch 2/50 284/284 [==============================] - 31s 110ms/step - loss: 0.0922 Epoch 3/50 284/284 [==============================] - 33s 115ms/step - loss: 0.0845 Epoch 4/50 284/284 [==============================] - 30s 107ms/step - loss: 0.0740 Epoch 5/50 284/284 [==============================] - 32s 114ms/step - loss: 0.0709
As follows I report the GRU in Pytorch, followed by the one in Keras.
class GRU_net(nn.Module): def __init__(self,hidden_dim, num_layers, output_size, drop_prob=0.0): super(GRU_net, self).__init__() self.hidden_dim = hidden_dim self.num_layers = num_layers self.output_size = output_size #GRU self.gru = nn.GRU(input_size = 1,hidden_size=hidden_dim, num_layers = num_layers, bidirectional=False, batch_first=True, dropout=drop_prob) #fully connected layers self.fc = nn.Linear(hidden_dim*1,output_size) def init_hidden(self, batch_size): hidden = torch.zeros(self.num_layers*1, batch_size, self.hidden_dim, device=torch.device(device)) return hidden def forward(self,x): batch_size = x.size(0) hidden = self.init_hidden(batch_size) gru_out, h = self.gru(x, hidden) gru_out = gru_out[:, -1, :] out = self.fc(gru_out) return out hidden_dim = 64 num_layers = 2 output_size = n_steps_out gru_net = GRU_net(hidden_dim, num_layers, output_size).to(device)
model = Sequential() model.add(LSTM(64, activation='relu', return_sequences=True, input_shape=(n_steps_in,n_features))) model.add(LSTM(64, activation='relu')) model.add(Dense(n_steps_out))
Furthermore, I checked each shape coming out from each layer and they are the same between the Pytorch and the Keras implementation.
Thank you in advance for the help.