I have been transforming a Keras-written code into a PyTorch code and encountered a problem that seems a bit tricky for me(Keras model here). I’ve checked every single layer for Keras&Pytorch for the model and before the LSTM layer, all of the outputs are same when the weights initialization are same for both of the framework(manually set it).
However, the LSTM layer acts differently even though the same weight is given for both of the models. Below is a code for Keras
That sounds like a good approach!
Could you share the code showing how you set the nn.LSTM parameters and compare the outputs?
I guess that the parameter initialization might not have worked properly.
Basically, the weights from Keras LSTM are in the list ‘weights’, and as Keras has only one bias(the same shape with both of the biases in the Pytorch LSTM), the same weights are given for both of the biases.
After running the code above, when I checked the weights for the model by running ‘model.LSTM.weight~ or model.LSTM.bias~’, all the weights seem to be properly set. If you have any idea where this difference of outputs coming from, please let me know!
That doesn’t sound right. Could you check the shape of the bias in the Keras model and compare it to both bias parameters in the PyTorch implementation?
This is true keras LSTM layer has only one bias while LSTM in torch has 2 biases. I faced such issue and thought to share it here to help people facing such issue. The way you are setting weights are correct but you need to set bias_hh_l0 as zero vector. To convert pretrained LSTM layer from keras to torch you need to do set up the weights in torch like such: