Hello! A lot of posts on Keras → PyTorch LSTM seem to focus on retaining the model architecture, but I couldn’t find anything online here; the closest I’ve reached is this SO post but would love your help.
Essentially, I have a Keras LSTM that I want to migrate:
# Keras
model.add(
Bidirectional(LSTM(256, dropout=0.15, recurrent_dropout=0.2, return_sequences=True),
merge_mode='concat')
)
o_keras = model(x)
In PyTorch, my implementation is simply
lstm = nn.LSTM(256, bidirectional = True, batch_first=True)
o_pytorch, (h, c) = lstm(x)
The output tensor shapes are identical, which is a good sign:
o_keras.shape == o_pytorch.shape # TRUE
However, I’m having trouble migrating weights and biases. Setting the weights is pretty straight forward as it’s essentially some matrix transpositions between Keras and PyTorch (e.g. Keras will have one set of weights that are (n_hidden_cells x num_positions), so the transpose is what you need for PyTorch).
The one that’s causing me some headache is that PyTorch has two bias terms but Keras has one.
Judging by the equation in the docs, my understanding is that we can set bias_ih_l0
as the value from the pre-trained model and the corresponding bias_hh_l0
to 0 as it’s a sum. However, adjusting the values (e.g. using something like this below) for the bias term gives slightly different results in the output.
alpha = [1.0, 0.9, 0.8 ... ]
lstm.bias_ih_l0 = keras_weights['bias'] * alpha
lstm.bias_hh_l0 = keras_weights['bias'] * (1-alpha)
I also want to ask if the tanh
activation from Keras’ LSTM is something we have to call as a separate step, e.g.
o_pytorch = lstm(x)
# o_pytorch = nn.Tanh()(o_pytorch) # is this necessary?
Would really appreciate your help. Thanks everyone!