Differences of LSTM in Pytorch and Keras

juanko · July 2, 2021, 7:25am

Hi,

I have been transforming a Keras-written code into a PyTorch code and encountered a problem that seems a bit tricky for me(Keras model here). I’ve checked every single layer for Keras&Pytorch for the model and before the LSTM layer, all of the outputs are same when the weights initialization are same for both of the framework(manually set it).

However, the LSTM layer acts differently even though the same weight is given for both of the models. Below is a code for Keras

conv_lstm = LSTM(64)(conv_reshape)

and below is the code for Pytorch

def __init__(...):
    ...
    self.LSTM = nn.LSTM(192, 64, 1, batch_first = True)
    ....

def forward(self, ...):
    ....
    lstm, _ = self.LSTM(output)
    lstm = lstm[:,-1,:]
    ....

If anyone has any idea about this, please help me out! Thanks in advance.

ptrblck · July 2, 2021, 8:21am

That sounds like a good approach!
Could you share the code showing how you set the nn.LSTM parameters and compare the outputs?
I guess that the parameter initialization might not have worked properly.

juanko · July 3, 2021, 10:27am

Thanks for the reply ptrblck!

So, the code I wrote for matching the weights are as below,

model.LSTM.weight_ih_l0.data = torch.nn.Parameter(torch.FloatTensor(np.transpose(weights[28])).contiguous().to(device))
model.LSTM.weight_hh_l0.data = torch.nn.Parameter(torch.FloatTensor(np.transpose(weights[29])).contiguous().to(device))

model.LSTM.bias_ih_l0.data = torch.from_numpy(weights[30]).to(device)
model.LSTM.bias_hh_l0.data = torch.from_numpy(weights[30]).to(device)

Basically, the weights from Keras LSTM are in the list ‘weights’, and as Keras has only one bias(the same shape with both of the biases in the Pytorch LSTM), the same weights are given for both of the biases.

After running the code above, when I checked the weights for the model by running ‘model.LSTM.weight~ or model.LSTM.bias~’, all the weights seem to be properly set. If you have any idea where this difference of outputs coming from, please let me know!

Thanks

ptrblck · July 4, 2021, 9:09pm

That doesn’t sound right. Could you check the shape of the bias in the Keras model and compare it to both bias parameters in the PyTorch implementation?

Bam17 · February 21, 2024, 11:50am

This is true keras LSTM layer has only one bias while LSTM in torch has 2 biases. I faced such issue and thought to share it here to help people facing such issue. The way you are setting weights are correct but you need to set bias_hh_l0 as zero vector. To convert pretrained LSTM layer from keras to torch you need to do set up the weights in torch like such:

model.LSTM.weight_ih_l0.data = torch.nn.Parameter(torch.FloatTensor(np.transpose(weights[28])).contiguous().to(device))
model.LSTM.weight_hh_l0.data = torch.nn.Parameter(torch.FloatTensor(np.transpose(weights[29])).contiguous().to(device))

model.LSTM.bias_ih_l0.data = torch.from_numpy(weights[30]).to(device)
model.LSTM.bias_hh_l0.data = torch.from_numpy(np.zeros(weights[30].shape)).to(device)