Trying to update parameters in a loop gives following error: One of the variables needed for gradient computation has been modified by an inplace operation

Hi, I’m doing a policy gradient method in PyTorch and wanted to move the network update into the loop but it stopped working. I’m still a PyTorch newbie so sorry if the explanation is obvious. I’m using PyTorch 1.6 (built from source).

Here is the original code that works fine:

self.policy.optimizer.zero_grad()
G = T.tensor(G, dtype=T.float).to(self.policy.device) 

loss = 0
for g, logprob in zip(G, self.action_memory):
    loss += -g * logprob
                                 
loss.backward()
self.policy.optimizer.step()

And this is a change I was trying to make that doesn’t work:

G = T.tensor(G, dtype=T.float).to(self.policy.device) 

loss = 0
for g, logprob in zip(G, self.action_memory):
    loss = -g * logprob
    self.policy.optimizer.zero_grad()
                                 
    loss.backward()
    self.policy.optimizer.step()

This second snippet gives the following error:

File "g:\VScode_projects\pytorch_shenanigans\policy_gradient.py", line 86, in learn
    loss.backward()
  File "G:\Anaconda3\envs\pytorch_env\lib\site-packages\torch\tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "G:\Anaconda3\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 4]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I read that this RuntimeError often has to do with having to clone something, because we’re using the same tensor to compute itself but I can’t make heads of tails of what is wrong in my case.

I’m not familiar with your use case and don’t know how all tensors are calculated.
However, based on the error I guess you might be updating the parameters and trying to reuse an already “stale” forward activation as described in this post.

Thanks.

Colloquially speaking the forward passes are “precalculated” in self.action_memory in my case.
Algorithm often calls the snippet below during an RL episode and the training loop in my question only takes place afterwards.

probabilities = F.softmax(self.policy.forward(observation))
action_probs = T.distributions.Categorical(probabilities)
action = action_probs.sample()
log_probs = action_probs.log_prob(action)
self.action_memory.append(log_probs)    

So in other words after one step(), the next backward() call doesn’t make much sense because the loss was calculated using old parameters? Am I understanding it correctly?

If so, one way to fix this error would be to execute forward passes in the loop as well, right?

Sorry if my train of thought (or wording) is confusing.

Yes, that’s correct. Additionally to this the intermediate activations were also calculated using the “old” parameters. The backward pass would use the new parameters with the old intermediates to calculate the gradients for the new parameters, which would be wrong.

Yes, that would be an alternative solution to your first approach (calling backward and step after the loop).

Don’t worry, it’s not at all confusing. :wink:

1 Like