Language model, intermediate training on a word instead of a whole sentence

geenidee005_geenidee · April 12, 2023, 2:13pm

I have an LSTM language model to predict the next word, which normally trains on sentences, or rather a batch of sentences. For an experiment I would like the model to backpropagate gradient for one word of a sentence and optimize accordingly. For that, I would think the following code would be sufficient:

for x in range(0, test_data.size(0)-1, args.bptt):

   print()
   sent_data, targets = data.get_batch(test_data, x, args) 
   first_eos = int((sent_data == eos).nonzero()[1][0])  # Cut off data at end of sentence
   sent_data = sent_data[:first_eos]
   targets   = targets[:first_eos]
   #print(sent_data)
   #print(sent_data.size())
   print(model)
   hidden = model.init_hidden(args.eval_batch_size)
   for word, target in zip(sent_data, targets):

       model.zero_grad()
       output, hidden = model(word.view(1,1), hidden)
       loss = crit(log_softmax(output.view(-1, ntypes), dim=1), target.view(1))
       loss.backward()
       optimizer.step()

But this gives a runtimeerror, specifically:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I am not familiar with retain_graph=True, but as I read it, it would solve this issue. However, when retain_graph=True, this brings a new error, namely:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2400, 300]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
How I solve the second error, I am not certain, looking for help.

ptrblck · April 13, 2023, 5:27am

Using retain_graph usually doesn’t solve the issue as it should only be used if you explicitly want to call backward multiple times.
Based on the code I would guess hidden creates the issue as it’s not detached.
Try to use:

output, hidden = model(word.view(1,1), hidden.detach())

to detach the hidden state from the last iteration and check if this would solve the issue.

geenidee005_geenidee · April 18, 2023, 6:48pm

Thank you, indeed detach was necessary(for h and c in my tuple hidden=(h,c)).