Set forget gate bias of LSTM

I’d like to set forget gate bias of LSTM to a specific value, for example, 1.0. I’ve been looking at torch.nn.LSTM but could not find a way to do it. Do I have to write my own LSTM layer to do this? Could someone kindly provide a quick example how this can be done? Thank you so much!

1 Like

It’s not super convenient, but we guarantee that a bias vector of each LSTM layer is structured like this:

[b_ig | b_fg | b_gg | b_og]

You can find that in the Variables section of the LSTM docs.

So, to set the forget gate bias, you’d need to filter out the bias parameters, and set all indices from 1/4 to 1/2 of the length to the desired value.

1 Like

Adam, thank you so much for getting back to me. May I ask another follow up question (this may sound stupid) - how do I access the Variables of LSTM? The API does not seem to provide direct access to modify Variables. Do I have to modify the source code of RNNbase? Best,

Suppose you have an LSTM model with two layers:

l = LSTM(10, 20, 2)

Then all its parameters names are in this list of lists:

>>> l._all_weights
[['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'],
 ['weight_ih_l1', 'weight_hh_l1', 'bias_ih_l1', 'bias_hh_l1']]

And you can acces them by name:[20:40].fill_(0)

(In this case biases are arrays of length 80.)

If you want to set the bias for all forget gates to 1:

for names in l._all_weights:
    for name in filter(lambda n: "bias" in n,  names):
        bias = getattr(l, name)
        n = bias.size(0)
        start, end = n//4, n//2[start:end].fill_(1.)

Thank you so much. I appreciate it!

I know this is not the right group but, is there a similar procedure to do this in torch.I am having a tough time figuring it out. Any help is deeply apreciated.

@akhil_reddy Check the example code for the nninit library with nn.FastLSTM. There’s a line with the comment “High forget gate bias”. I think it’s the same for nn.LSTM but not 100% sure.

Can the command be used after initialising all the parameters of the model (which consists of the lstm layer) using getParameters() ?

Out of curiosity , how are the weights and biases of each gate ordered in FastLSTM
I see that fast gate bias is the fourth quadrant of the biases from the command

@akhil_reddy Yes it should be fine, but I would still check manually by initialising a LSTM, using :getParameters(), updating the bias, and then checking the unrolled weights.

Ah actually that’s an error on my part which I will correct - seems like it’s the 3rd out of 4 blocks according to the source code. Your best bet is to check the source code if you have any doubts, and raise an issue with the rnn repo if you’re still stuck as this is not a PyTorch issue.

Thank you so much for the help, will proceed accordingly.:slight_smile:

When using a GRU, I would like to initialize the bias reset gate to -1 and would appreciate tips for finding the proper start and end values!

Hi Adam,
would you be able to specify what do you mean by

Why those specific indices? Wouldn’t be lstm.bias_hh_l0 already be all biases of the forget gate of the first layer?

GREAT ! That’s what we all need :grinning: