Hi, I have used Pytorch for some time. But when I read the example code about language model, I’m quite confused about how to get gradient of hidden state.
hidden = model.init_hidden(args.batch_size)
for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)):
data, targets = get_batch(train_data, i)
# Starting each batch, we detach the hidden state from how it was previously produced.
# If we didn't, the model would try backpropagating all the way to start of the dataset.
hidden = repackage_hidden(hidden)
model.zero_grad()
output, hidden = model(data, hidden)
loss = criterion(output.view(-1, ntokens), targets)
loss.backward()
But when I print hidden[0].grad or hidden[1].grad after loss.backward(), I got None. I have tried two approaches to get gradient.
hidden[0].register_hook(get_gradient)
hidden[1].register_hook(get_gradient)
And
hidden[0].retain_grad()
hidden[1].retain_grad()
However, neither way works. So how can get gradient of hidden state and ceil state?