Help spotting inplace operation error

I am trying to implement a policy gradient method (using reward-to-go and a state value baseline). I’m given a template and my job is to fill in the gaps in the template.

When I try to run the code after filling in the gaps, I get an inplace error (shown at end of post). The template should not have errors, so the problems I encounter should be due to the code I’ve written.

Essentially, I asking if anybody can spot an inplace operation (and provide a workaround) in the code that I show below.

The parts I have filled in are shown below.

1. A function to calculate the return-to-go, given a string of returns over multiple episodes

    def _calc_return(self, r, done):
        '''
        TODO 2.1: Given a tensor of per-timestep rewards (r), and a tensor (done)
        indicating if a timestep is the last timestep of an episode, Output a
        tensor (return_t) containing the return (i.e. reward-to-go) at each timestep.
        '''
        
        gamma = self._discount
        done_indices = torch.nonzero(done, as_tuple=False)
        indices = torch.cat((torch.zeros(1), torch.squeeze(torch.transpose(done_indices, 0, 1)), torch.Tensor([len(r) - 1])))
        return_t_list = []

        for i in range(len(indices) - 1):
            if i == 0:
                episode = r[int(indices[i]):int(indices[i + 1] + 1)]
                episode_indices = torch.arange(0, len(episode))
                for j in range(len(episode_indices)):
                    episode_to_go = episode[j:]
                    indices_to_go  = episode_indices[j:]
                    gammas = torch.full([len(episode_to_go)], gamma)
                    exponents = episode_indices[:len(episode_indices) - j]
                    discounts = torch.pow(gammas, exponents)
                    discounted_sum = torch.dot(episode_to_go, discounts)
                    return_t_list.append(discounted_sum)
            if i != 0:
                episode = r[int(indices[i] + 1):int(indices[i + 1] + 1)]
                episode_indices = torch.arange(0, len(episode))
                for j in range(len(episode_indices)):
                    episode_to_go = episode[j:]
                    indices_to_go  = episode_indices[j:]
                    gammas = torch.full([len(episode_to_go)], gamma)
                    exponents = episode_indices[:len(episode_indices) - j]
                    discounts = torch.pow(gammas, exponents)
                    discounted_sum = torch.dot(episode_to_go, discounts)
                    return_t_list.append(discounted_sum)

        return_t = torch.stack(return_t_list)

        return return_t

2. A function to calculate the “advantage” (reward-to-go minus state value)

    def _calc_adv(self, norm_obs, ret):
        '''
        TODO 2.2: Given the normalized observations (norm_obs) and the return at
        every timestep (ret), output the advantage at each timestep (adv).
        '''
        
        value = self._model.eval_critic(norm_obs)
        value = torch.squeeze(torch.transpose(value, 0, 1))
        adv = ret - value
        return adv

3. A function to calculate the critic loss, i.e. the loss used to train the value function

    def _calc_critic_loss(self, norm_obs, tar_val):
        '''
        TODO 2.3: Given the normalized observations (norm_obs) and the returns at
        every timestep (tar_val), compute a loss for updating the value
        function (critic).
        '''

        value = self._model.eval_critic(norm_obs)
        value = torch.squeeze(torch.transpose(value, 0, 1))
        squared_diff = (tar_val - value)**2
        loss = squared_diff.mean()
        return loss

4. A function to calculate the actor loss, i.e the loss used to train the policy

    def _calc_actor_loss(self, norm_obs, norm_a, adv):
        '''
        TODO 2.4: Given the normalized observations (norm_obs), normalized
        actions (norm_a), and the advantage at every timestep (adv), compute
        a loss for updating the policy (actor).
        '''
        
        policy = self._model.eval_actor(norm_obs)
        policy_a = policy.log_prob(norm_a)
        loss = -(adv * policy_a).mean()
        return loss

Now, when I run all the code, I get the following error:

Traceback (most recent call last):
  File "run.py", line 103, in <module>
    main(sys.argv)
  File "run.py", line 93, in main
    train(agent=agent, max_samples=max_samples, out_model_file=out_model_file, 
  File "run.py", line 50, in train
    agent.train_model(max_samples=max_samples, out_model_file=out_model_file, 
  File "/Users/jesse/rl_assignments/learning/base_agent.py", line 57, in train_model
    train_info = self._train_iter()
  File "/Users/jesse/rl_assignments/learning/base_agent.py", line 226, in _train_iter
    train_info = self._update_model()
  File "/Users/jesse/rl_assignments/a2/pg_agent.py", line 144, in _update_model
    actor_info = self._update_actor(actor_batch)
  File "/Users/jesse/rl_assignments/a2/pg_agent.py", line 187, in _update_actor
    loss.backward()
  File "/Users/jesse/anaconda3/envs/rl/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/jesse/anaconda3/envs/rl/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 41; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Essentially, I am guessing that I’ve done an inplace operation somewhere that is messing with the gradient computation. However, I do not know what to look for, and torch.autograd.set_detect_anomaly(True) doesn’t seem helpful.

I appreciate any help.

Hey! Can you also provide the part of the code where you call the backwards?
Thanks!

Hello, thanks for replying. I assume you are referring to the code surrounding the loss.backward() call. Here is the function containing that call:

    def _update_actor(self, batch):
        norm_obs = batch["norm_obs"]
        norm_a = batch["norm_action"]
        adv = batch["adv"]
        
        loss = self._calc_actor_loss(norm_obs, norm_a, adv)
        
        info = {
            "actor_loss": loss
        }
        
        if (self._action_bound_weight != 0):
            a_dist = self._model.eval_actor(norm_obs)
            action_bound_loss = self._compute_action_bound_loss(a_dist)
            if (action_bound_loss is not None):
                action_bound_loss = torch.mean(action_bound_loss)
                loss += self._action_bound_weight * action_bound_loss
                info["action_bound_loss"] = action_bound_loss.detach()

        self._actor_optimizer.zero_grad()
        loss.backward()
        self._actor_optimizer.step()
        
        return info

Let me know if you would like to see anything else. Thanks.

loss += self._action_bound_weight * action_bound_loss looks problematic as that would be an in-place operation.

Does changing it to an out-of-place one via
loss = loss + ...
work?

>>> import torch
>>> loss = torch.ones(10, requires_grad=True)
>>> loss += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
>>> loss = torch.ones(10, requires_grad=True)
>>> loss = loss + 1 #OK
>>> loss.sum().backward()
>>>

Thanks for replying. In the _update_actor function, I commented out loss += self._action_bound_weight * action_bound_loss and put in loss = loss + self._action_bound_weight * action_bound_loss.

Looks like I get a very similar (possibly identical) error:

Traceback (most recent call last):
  File "run.py", line 103, in <module>
    main(sys.argv)
  File "run.py", line 93, in main
    train(agent=agent, max_samples=max_samples, out_model_file=out_model_file, 
  File "run.py", line 50, in train
    agent.train_model(max_samples=max_samples, out_model_file=out_model_file, 
  File "/Users/jesse/rl_assignments/learning/base_agent.py", line 57, in train_model
    train_info = self._train_iter()
  File "/Users/jesse/rl_assignments/learning/base_agent.py", line 226, in _train_iter
    train_info = self._update_model()
  File "/Users/jesse/rl_assignments/a2/pg_agent.py", line 144, in _update_model
    actor_info = self._update_actor(actor_batch)
  File "/Users/jesse/rl_assignments/a2/pg_agent.py", line 188, in _update_actor
    loss.backward()
  File "/Users/jesse/anaconda3/envs/rl/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/jesse/anaconda3/envs/rl/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 41; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am not sure if this is helpful, but in _calc_actor_loss, if I comment out the code I shared in my original post and just use

    def _calc_actor_loss(self, norm_obs, norm_a, adv):
        loss = torch.zeros(1)
        return loss

then the code runs without error, although obviously the computations will be garbage.

Thanks for your help.

Well, it’s difficult to say where the problem is without any visibility into how
policy.log_prob and self._model.eval_actor are implemented.

The last line doesn’t seem to be obviously wrong

>>> a = torch.ones(10, requires_grad=True)
>>> b = torch.ones(10, requires_grad=True)
>>> loss = -(a * b).mean()
>>> loss.backward()
>>>

The code is spread across a number of files, so I’m not absolutely sure these are the correct functions:

    def eval_actor(self, obs):
        h = self._actor_layers(obs)
        a_dist = self._action_dist(h)
        return a_dist
    def log_prob(self, x):
        diff = x - self._mean
        logp = -0.5 * torch.sum(torch.square(diff / self._std), dim=-1)
        logp += -0.5 * self._dim * np.log(2.0 * np.pi) - torch.sum(self._logstd, dim=-1)
        # logp = logp + -0.5 * self._dim * np.log(2.0 * np.pi) - torch.sum(self._logstd, dim=-1)
        return logp

Using the logp = logp + ... line in log_prob didn’t make any difference. I suppose this means that I have to try to dig into eval_actor.

The whole thing is a bit strange, though, because I’m not supposed to have to modify the .py files containing the above functions. Hence my focus on the four functions I provided in my original post.

Hi jsa!

Two comments: The first analyzes the backward-pass error message; and the
second concerns set_detect_anomaly (True).

This error message gives two useful pieces of information:

First, it tells you the type and shape of the tensor being modified inplace,
torch.FloatTensor [64, 1]; and second it tells you that the tensor is being
modified inplace forty times, version 41; expected version 1 instead.

Where in your code do you have tensors of that shape? Take a look at them
and the inplace modification might become apparent. Where might such a
tensor be modified many times? Do you have a loop or other repetitive structure
in your forward pass? Look there.

You can use your tensor’s _version property to track down where the inplace
modification occurs with a divide-and-conquer scheme. Look at the candidate
tensors, t, that have the reported shape and print out t._version at various
places in your forward pass. When you find a section of code where _version
increases, you’ve found the section that contains your inplace modification.
Now print out t._version at various places in that section of code to further
narrow down the location. If need be, keep doing this until you’ve bracketed
a single line of code with t._versions that show _version increasing. That
line is your problem.

It might be the case that you have more than one line of code that makes
an inplace modification (that matters to your backward pass) – if so, you
will need to find and fix them all.

(You may have multiple candidate tensors with the same shape and type.
You can print out t._version for all of them where they’re first created and
then again at the end of your forward pass to determine which one is the
culprit.)

detect_anomaly (True) is often very helpful. Look in the “Traceback of
forward call” that detect_anomaly causes to be printed out when the inplace
modification error is detected in the backward pass. Look for lines in that
traceback that reference your code (rather than pytorch internals). This
typically leads you to the line of code that is making the inplace modification.

As an aside, I’ve noticed a number of posts where inplace-modification errors
crop up in policy / actor-critic code. Maybe there’s something about these
algorithms that is conducive to such errors, or maybe there’s some sample code
out there with landmines in it. In any event, if you find your bug, please post
what and where it was, how you found it, and how it was caused by / related to
actor-critic (if indeed it was). This could be a big help to those who might have
similar problems in the future.

Good luck.

K. Frank

1 Like

K. Frank: Thanks for your reply. I will try to deal with it when I have more time.

I went to get some help with my code from a teaching assistant. The TA suggested that I change the _calc_adv function from the way I had it, i.e.

        value = self._model.eval_critic(norm_obs)
        value = torch.squeeze(torch.transpose(value, 0, 1))
        adv = ret - value
        return adv

to the following:

        value = self._model.eval_critic(norm_obs)
        value = torch.squeeze(value).detach()
        adv = ret - value
        return adv

Now the code runs without error.

I am not totally convinced this is a good solution. I’m wondering if this will mess up the critic loss computation. I have done a little bit of training, and the critic loss seems very high compared to the actor loss, and the test return seems kind of low, but perhaps I just haven’t trained enough. (I’m also not an expert in any of this.)

I will train more and see what happens.

FYI we have a page dedicated to this in torchrl doc:
https://pytorch.org/rl/reference/generated/knowledge_base/PRO-TIPS.html

Hi jsa!

I think your skepticism about this “fix” is well founded. I think it more likely
that this is sweeping the real problem under the rug rather than correcting it.

detach() disconnects the result of squeeze() from the computation graph
so that backpropagation doesn’t continue on back up through eval_critic().
If, for example, eval_critic() contained the bad inplace modification, you
haven’t fixed it – you just don’t backpropagate through it, so you suppress the
error message and hide the real problem.

This could explain why your critic loss seems high. If you don’t backpropagate
through eval_critic(), the parameters that feed into eval_critic() and
presumably contribute to your “critic loss” don’t get trained, so your critic loss
doesn’t get better.

As an aside, it appears that your TA may be a practitioner of what I sometimes
call “Monte Carlo programming” – change this, change that, add this, delete
that until your program “works.” The problem with this is that your program
probably doesn’t actually work – you’ve just masked the most obvious
consequences of your bugs, but the bugs are still there.

Good luck.

K. Frank

If anyone is waiting for a denouement, I asked the professor about the use of .detach(), specifically whether this would mess up the critic loss computations, and sweep the real issue under the rug. I was told that it was correct to use .detach() as suggested in order to cut off the gradient computation in the right place. (I did not press for a detailed explanation.)

Presumably the professor knows this code better than anybody, so I guess that’s that. Thanks for the help.

I was following your issue. So, did you get the results with detach() ?

The code ran without error, and the test return was more or less what it I was told it should be. So as far as I know, things worked properly.

You should note, however, that @KFrank’s explanation is correct and indeed detaching value would cut the computation graph at this point and the parameters involved to calculate value might not be trained.
Also, @jsa378 explained “I did not press for a detailed explanation”, which I think is also concerning especially when also claiming that “presumably the processor knows this code better than anybody”.

Commenting on ptrblck’s last paragraph: It’s a slightly tricky situation for me. You guys maybe/probably know PyTorch better than my professor, but I cannot dump all the code here and expect anybody to comb through all of it.

So I cannot be completely sure whether you guys or my professor are correct about this. All I can say is that after implementing .detach() as suggested, the code runs and performs roughly as the professor said it would. Perhaps using .detach() as suggested is not best practice - or maybe it’s the right thing to do, given the context of the code.

(To anybody reading this: This is not advice. I am not an authority on PyTorch.)

Thanks for everyone’s help.

Hi jsa!

The following is meant as an honest suggestion, not as a rhetorical comment.

First, you have to do whatever is best to keep your TA and professor happy
in the context of your project.

However, since you’re trying to learn pytorch and neural networks, if and when
you have a free moment, you might try to track down where in your code the
inplace modification is occurring.

Then look at the logic of your model and training algorithm and form your
own opinion about whether or not that inplace modification is “legitimate”
and whether the use of .detach() is or is not an appropriate solution in
the context of your specific use case.

Good luck.

K. Frank

K. Frank,

Thanks for the advice. I will see if I have time when the course is over.