hi i am working about time series data. i have a problem that confused me. i am tuned a neural network with same implementation in both keras and pytorch but had different result.

This is not the only problem. The keras model always gives the same results (Every time I do train model). But the Pytorch model gives the results in 10% of the cases consistent with the cross model. And most of the time it has very bad results that I put(And of course not like the results of keras).

Please guide me. thankssssssssssssss

keras model:

```
model_input = keras.Input(shape=(x_train_T.shape[1], 8))
x_1 = layers.GRU(75,return_sequences=True)(model_input)
x_1 = layers.GRU(90)(x_1)
x_1 = layers.Dense(95)(x_1)
x_1 = layers.Dense(15)(x_1)
model = keras.models.Model(model_input, x_1)
model.compile(optimizer= adam_optim, loss= "mse" , metrics='accuracy')
model.fit(x_train_T, y_train, batch_size=1, epochs = 100)
```

pytorch model:

```
class GRU(nn.Module):
def __init__(self,input_size, hidden_size_1, hidden_size_2, hidden_size_3, output_size, num_layers, device):
super(GRU, self).__init__()
self.input_size = input_size
self.hidden_size_1 = hidden_size_1
self.hidden_size_2 = hidden_size_2
self.hidden_size_3 = hidden_size_3
self.num_layers = num_layers
self.device = device
self.gru_1 = nn.GRU(input_size, hidden_size_1, num_layers, batch_first=True)
self.gru_2 = nn.GRU(hidden_size_1, hidden_size_2, num_layers, batch_first=True)
self.fc_1 = nn.Linear(hidden_size_2, hidden_size_3)
self.fc_out = nn.Linear(hidden_size_3, output_dim)
def forward(self, x):
input_X = x
h_1 = torch.zeros(self.num_layers, input_X.size(0), self.hidden_size_1, device=self.device)
h_2 = torch.zeros(self.num_layers, input_X.size(0), self.hidden_size_2, device=self.device)
out_gru_1 , h_1 = self.gru_1(input_X, h_1)
out_gru_2 , h_2 = self.gru_2(out_gru_1, h_2)
out_Dense_1 = self.fc_1(out_gru_2[:,-1,:])
out_Dense_out = self.fc_out(out_Dense_1)
return out_Dense_out
##############################
input_dim = 8
hidden_dim_1 = 75
hidden_dim_2 = 90
hidden_dim_3 = 95
num_layers = 1
output_dim = 15
num_epochs = 100
model = GRU(input_size=input_dim, hidden_size_1 = hidden_dim_1, hidden_size_2 = hidden_dim_2, hidden_size_3 = hidden_dim_3,output_size = output_dim, num_layers=num_layers, device = device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
import time
for t in range(num_epochs ):
start_time = time.time()
loss_p = []
for i in range(x_train_T.size(0)):
inputs, target = x_train_T[i:i+1] , y_train[i:i+1]
inputs = torch.tensor(inputs, dtype=torch.float32).to(device)
target = torch.tensor(target, dtype=torch.float32).to(device)
y_train_pred = model(inputs)
loss_ = criterion(y_train_pred, target)
optimizer.zero_grad()
loss_.backward()
optimizer.step()
loss_p.append(loss_)
loss_p = np.array(loss_p)
loss_P = loss_p.sum(0)/loss_p.shape[0]
end_time = time.time()
print("Epoch ", t, "MSE: ", loss_P.item() , "///epoch time: {0} seconds".format(round(end_time - start_time, 2)))
##############################
```

In rare cases, the loss result of both starts at approximately 0.09 and ends at approximately 0.015.

In most cases, the losses is the same for the keras model , but for the pytorch it stays at 0.08.

i.e , sometimes Pytorch is trained and sometimes not

i think should initialize pytorch layers as same as the keras layers.

but how???

lstm initialization in the keras is as follows:

```
def __init__(units, activation='tanh', recurrent_activation='sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, time_major=False, reset_after=True, **kwargs)
```

and linear layers:

```
def __init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs)
```

how i initializing layers in pytorch???