Hello! A lot of posts on Keras → PyTorch LSTM seem to focus on retaining the model architecture, but I couldn’t find anything online here; the closest I’ve reached is this SO post but would love your help.

Essentially, I have a Keras LSTM that I want to migrate:

```
# Keras
model.add(
Bidirectional(LSTM(256, dropout=0.15, recurrent_dropout=0.2, return_sequences=True),
merge_mode='concat')
)
o_keras = model(x)
```

In PyTorch, my implementation is simply

```
lstm = nn.LSTM(256, bidirectional = True, batch_first=True)
o_pytorch, (h, c) = lstm(x)
```

The output tensor *shapes* are identical, which is a good sign:

```
o_keras.shape == o_pytorch.shape # TRUE
```

However, I’m having trouble migrating weights and biases. Setting the *weights* is pretty straight forward as it’s essentially some matrix transpositions between Keras and PyTorch (e.g. Keras will have one set of weights that are (n_hidden_cells x num_positions), so the transpose is what you need for PyTorch).

The one that’s causing me some headache is that PyTorch has *two* bias terms but Keras has **one**.

Judging by the equation in the docs, my understanding is that we can set `bias_ih_l0`

as the value from the pre-trained model and the corresponding `bias_hh_l0`

to 0 as it’s a sum. However, adjusting the values (e.g. using something like this below) for the bias term gives slightly different results in the output.

```
alpha = [1.0, 0.9, 0.8 ... ]
lstm.bias_ih_l0 = keras_weights['bias'] * alpha
lstm.bias_hh_l0 = keras_weights['bias'] * (1-alpha)
```

I also want to ask if the `tanh`

activation from Keras’ LSTM is something we have to call as a separate step, e.g.

```
o_pytorch = lstm(x)
# o_pytorch = nn.Tanh()(o_pytorch) # is this necessary?
```

Would really appreciate your help. Thanks everyone!