Use of the addition assignment operator (+=) on a tensor leads to a RuntimeError

This question is similar to the thread Runtime Error in backpropagation of the graph second time, but I wanted to ask about the implementation logic behind the issue.

Using the addition assignment operator on a tensor with itself (t += t) leads to RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation during backprop. Replacing this operation with an individual assignment of the addition (e.g.: t = t + t) works without issues.

A minimal example to replicate the error:

import torch

# Define the model
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(3, 1)
        self.linear2 = torch.nn.Linear(1, 3)
        self.activation = torch.nn.ReLU()

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.activation(x)
        return x

model = Model()

# Define the optimizer and loss function
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
loss_fn = torch.nn.MSELoss()

# Define the input and target tensors
x = torch.tensor([[1, 2, 3], [1, 2, 3]], dtype=torch.float32)
y = torch.tensor([[4, 5, 6], [7, 8, 9]], dtype=torch.float32)

# Perform a forward pass, compute the loss, perform a backward pass, and update the weights
model.train()
optimizer.zero_grad()
outputs = model(x)
outputs += outputs
loss = loss_fn(outputs, y)
loss.backward()
optimizer.step()

Is this behavior expected? I was always under the impression that t += t is simply syntactic sugar for t = t + t.

Yes, it’s expected.
+= is an inplace operation that is defined in pythons __iadd__
which differs from a=a+x that relies on __add__

The former is, as the error explains, in-place, and modifies the data directly in memory without using the mechanisms that pytorch needs for trazability of your ops.

1 Like