Initialize hidden state with identity matrix

smu226 · April 2, 2019, 10:42pm

Hello! I have an LSTM which looks like this:

class BD_LSTM(nn.Module):
    def __init__(self, nl):
        super().__init__()
        self.nl = nl
        self.rnn = nn.LSTM(1, n_hidden, nl, bidirectional=True) 
        self.l_out = nn.Linear(n_hidden*2, n_classes)
        self.init_hidden(bs)
        
    def forward(self, input):
        outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
        return F.log_softmax(self.l_out(outp),dim=2)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl*2, bs, n_hidden)),
                  V(torch.zeros(self.nl*2, bs, n_hidden)))

model = TESS_LSTM(2).cuda()

I want to initialize the hidden layer of the LSTM to the identity matrix (I read this is better for convergence purposes). When I run this:

?model.rnn

one of the attributes (which should be the one I need) is this: weight_hh_l. This is the full output relevant for my question:

Attributes:
    weight_ih_l[k] : the learnable input-hidden weights of the :math:`\text{k}^{th}` layer
        `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size x input_size)`
    weight_hh_l[k] : the learnable hidden-hidden weights of the :math:`\text{k}^{th}` layer
        `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size x hidden_size)`
    bias_ih_l[k] : the learnable input-hidden bias of the :math:`\text{k}^{th}` layer
        `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)`
    bias_hh_l[k] : the learnable hidden-hidden bias of the :math:`\text{k}^{th}` layer
        `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)`

However, when I do this:

model.rnn.weight_hh_l[0].data.copy_(torch.eye(n_hidden))

I am getting this error:

AttributeError: 'LSTM' object has no attribute 'weight_hh_l'

What am I doing wrong? Thank you!

Sanquen · October 9, 2020, 7:52pm

you don’t need to include the brackets, just suffix with the layer index directly
for layer 0:
rnn.weight_hh_l0
for layer 1
rnn.weight_hh_l1
etc