Hi, I have used Pytorch for some time. But when I read the example code about language model, I’m quite confused about how to get gradient of hidden state.
hidden = model.init_hidden(args.batch_size) for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)): data, targets = get_batch(train_data, i) # Starting each batch, we detach the hidden state from how it was previously produced. # If we didn't, the model would try backpropagating all the way to start of the dataset. hidden = repackage_hidden(hidden) model.zero_grad() output, hidden = model(data, hidden) loss = criterion(output.view(-1, ntokens), targets) loss.backward()
But when I print hidden.grad or hidden.grad after loss.backward(), I got None. I have tried two approaches to get gradient.
However, neither way works. So how can get gradient of hidden state and ceil state?