I have an LSTM language model to predict the next word, which normally trains on sentences, or rather a batch of sentences. For an experiment I would like the model to backpropagate gradient for one word of a sentence and optimize accordingly. For that, I would think the following code would be sufficient:
for x in range(0, test_data.size(0)-1, args.bptt):
print() sent_data, targets = data.get_batch(test_data, x, args) first_eos = int((sent_data == eos).nonzero()[1][0]) # Cut off data at end of sentence sent_data = sent_data[:first_eos] targets = targets[:first_eos] #print(sent_data) #print(sent_data.size()) print(model) hidden = model.init_hidden(args.eval_batch_size) for word, target in zip(sent_data, targets):
model.zero_grad() output, hidden = model(word.view(1,1), hidden) loss = crit(log_softmax(output.view(-1, ntypes), dim=1), target.view(1)) loss.backward() optimizer.step()
But this gives a runtimeerror, specifically:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
I am not familiar with retain_graph=True, but as I read it, it would solve this issue. However, when retain_graph=True, this brings a new error, namely:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2400, 300]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
How I solve the second error, I am not certain, looking for help.