Hy guys, i have a trouble converting what seems to be a simple Keras LSTM model to Pytorch, I am new to Pytorch, so any help is appreciated.

This is model from Keras :

```
def model_lstm(input_shape):
inp = Input(shape=(input_shape[1], input_shape[2],))
# This is the LSTM layer
# Bidirecional implies that the 160 chunks are calculated in both ways, 0 to 159 and 159 to zero
# 128 and 64 are the number of cells used, too many can overfit and too few can underfit
x = Bidirectional(CuDNNLSTM(128, return_sequences=True))(inp)
# The second LSTM can give more fire power to the model, but can overfit it too
x = Bidirectional(CuDNNLSTM(64, return_sequences=True))(x)
# Attention is a new tecnology that can be applyed to a Recurrent NN to give more meanings to a signal found in the middle
# of the data, it helps more in longs chains of data. A normal RNN give all the responsibility of detect the signal
# to the last cell. Google RNN Attention for more information :)
x = Attention(input_shape[1])(x)
# A intermediate full connected (Dense) can help to deal with nonlinears outputs
x = Dense(64, activation="relu")(x)
# A binnary classification as this must finish with shape (1,)
x = Dense(1, activation="sigmoid")(x)
model = Model(inputs=inp, outputs=x)
# Pay attention in the addition of matthews_correlation metric in the compilation, it is a success factor key
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[matthews_correlation])
return model
```

I have train tensor in shape of ([1, 2904, 160, 19]), validate tensor in shape of ([1, 2904]), and test tensor in shape of ([6779, 160, 57])

And this is the train function:

```
# First, create a set of indexes of the 5 folds
splits = list(StratifiedKFold(n_splits=N_SPLITS, shuffle=True, random_state=2019).split(X, y))
preds_val = []
y_val = []
# Then, iteract with each fold
# If you dont know, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')]
for idx, (train_idx, val_idx) in enumerate(splits):
K.clear_session() # I dont know what it do, but I imagine that it "clear session" :)
print("Beginning fold {}".format(idx + 1))
# use the indexes to extract the folds in the train and validation data
train_X, train_y, val_X, val_y = X[train_idx], y[train_idx], X[val_idx], y[val_idx]
# instantiate the model for this fold
model = model_lstm(train_X.shape)
# This checkpoint helps to avoid overfitting. It just save the weights of the model if it delivered an
# validation matthews_correlation greater than the last one.
ckpt = ModelCheckpoint('weights_{}.h5'.format(idx), save_best_only=True, save_weights_only=True, verbose=1,
monitor='val_matthews_correlation', mode='max')
# Train, train, train
model.fit(train_X, train_y, batch_size=128, epochs=50, validation_data=[val_X, val_y], callbacks=[ckpt])
# loads the best weights saved by the checkpoint
model.load_weights('weights_{}.h5'.format(idx))
# Add the predictions of the validation to the list preds_val
preds_val.append(model.predict(val_X, batch_size=512))
# and the val true y
y_val.append(val_y)
```

I’m trying to understand this peace of code and implement it using pytorch, and subject is from kaggle competition, of detecting partial discharges (pd). I understanded characteristics behind data and i have extracted features, but I’m having trouble to write the network.