How to get gradient of hidden state of LSTM?

Ericcc · July 19, 2019, 1:18pm

Hi, I have used Pytorch for some time. But when I read the example code about language model, I’m quite confused about how to get gradient of hidden state.

    hidden = model.init_hidden(args.batch_size)
    for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)):
        data, targets = get_batch(train_data, i)
        # Starting each batch, we detach the hidden state from how it was previously produced.
        # If we didn't, the model would try backpropagating all the way to start of the dataset.
        hidden = repackage_hidden(hidden)
        model.zero_grad()
        output, hidden = model(data, hidden)
        loss = criterion(output.view(-1, ntokens), targets)
        loss.backward()

But when I print hidden[0].grad or hidden[1].grad after loss.backward(), I got None. I have tried two approaches to get gradient.

hidden[0].register_hook(get_gradient)
hidden[1].register_hook(get_gradient)

And

hidden[0].retain_grad()
hidden[1].retain_grad()

However, neither way works. So how can get gradient of hidden state and ceil state?

tom · July 19, 2019, 4:52pm

There are two things:

The variable hidden is overwritten by output, hidden = model(data, hidden), so if you want the initial hidden state, you would have to instead do output, hidden_new = ....
The initial hidden state likely does not require grad, so you need hidden[0].requires_grad_() for the initial hidden state (for the final hidden state, new_hidden[0].retain_grad_() is good.

All this assumes you are indeed using the LSTM and hidden and new_hidden are tuples.

Best regards

Thomas