Inplace operation Runtime Error

KFrank · July 19, 2024, 4:08pm

Hi Elizabeth!

I don’t really understand actor-critic training, but it seems that such models
are hotbeds for inplace-modification errors. You might start by looking at
this discussion of some of the ways such errors can arise

**RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [512, 25]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. ```

Do you call .backward (retain_graph = True) anywhere? Doing so is
sometimes (usually?) incorrect and can lead to inplace-modification errors.

Note that the shape of the problem tensor, [512, 25], can be a useful
piece of information, see below.

/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:744: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
...
  File "<ipython-input-11-96234079d43e>", line 57, in forward
    output = self.last_fc(h)
...
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)

…

# Mlp
class Mlp(Module):
        ...
        self.last_fc = Linear(in_size, output_size)
...
# Critic features:
class Critic(Module):
        ...
        for i in range(n_nets):
            net = Mlp(state_dim + action_dim, [512, 512, 512], n_quantiles) # Mlp comes up

Based on the forward-call traceback and the shape reported for the problem
tensor, it looks like last_fc of one of your Critics – a tensor presumably
being optimized – is the cause of your problem.

One possibility is that you are doing something like:

loss.backward (retain_graph = True)   # leaves Critic.last_fc in what will become a stale computation graph
...
opt.step()                            # counts as an inplace modification
...
loss.backward (...)                   # backpropagates through the stale graph and hits the modified Critic.last_fc

In any event, you can find various methods for finding inplace-modification errors in the following post (which happens to be about an actor-critic model):

"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further a autograd

Hi Fahmyadan and Sangyoon! Here are some suggestions about how to track down (and maybe fix) inplace-modification errors. Note that an inplace modification in the forward pass is not necessarily* an error – it depends on whether and how the tensor that was modified is used in the backward pass. Note that inplace operations can be useful for saving memory – if you replace an innocent inplace operation with an out-of-place equivalent, your training will use more memory (and, to a minor e…

Good luck!

K. Frank