Pytorch Inplace Operation Error

When running the below code, I get the following issue:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The code below performs two forward passes, with one backward propagation:

for epoch in range(10):
    count = 0
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]

        loss = loss_fn(y_pred, ybatch)

        if count % 2 == 0:
            if count != 0:
                optimizer.zero_grad()
                # Pytorch doesn't like this
                ((loss_1 + loss_2) / 2).backward()
                optimizer.step()
            loss_1 = loss
        else:
            loss_2 = loss
       count += 1

but if I slightly alter the code like so, it works fine:

for epoch in range(10):
    count = 0
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]

        loss = loss_fn(y_pred, ybatch)



        if (count+1) % 2 == 0:
            optimizer.zero_grad()
            ((loss_1 + loss) / 2).backward()
            optimizer.step()
        else:
            loss_1 = loss

        count += 1

Why does the first version cause this issue?

I cannot reproduce the issue using:

model = nn.Linear(1, 1)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()


for epoch in range(10):
    count = 0
    for i in range(0, 10):
        x = torch.randn(1, 1)
        output = model(x)
        target = torch.randn_like(x)
        
        loss = loss_fn(output, target)

        if count % 2 == 0:
            if count != 0:
                optimizer.zero_grad()
                # Pytorch doesn't like this
                ((loss_1 + loss_2) / 2).backward()
                optimizer.step()
            print("updating loss_1")
            loss_1 = loss
        else:
            print("updating loss_2")
            loss_2 = loss
        count += 1

Could you check if my code reproduces it on your side and if not add the missing pieces to your code making it executable?

Hi Sergio!

optimizer.step() modifies the parameters of your model inplace, and, in
your first version, causes the inplace-modification error.

For count = 0, you compute loss and assign it to loss_1. loss_1 is
linked to the parameters of model by the computation graph.

For count = 1, you compute a new value of loss and assign it to loss_2.

For count = 2, you compute a new value of loss, backpropagate
loss_1 and loss_2, call optimizer.step(), which modifies the
parameters of model inplace, and assign loss to loss_1 At this point
loss_1 depends on model’s old parameter values because it was
computed (as loss) prior to optimizer.step().

For count = 3, you compute a new value of loss and assign it to
loss_2 (but do not call .backward() nor optimizer.step()).

Then for count = 4, you call .backward() on loss_1 + loss_2.
Backpropagating loss_1 backpropagates through the model parameters,
but loss_1 depends on the old values of the model parameters,
triggering the inplace-modification error.

In this case you either backpropagate the newly-computed loss and
loss_1 and call optimizer.step() (which modifies your model inplace)
or you assign the newly-computed loss to loss_1. But when you iterate
further and next call .backward() both loss_1 and loss are newly
computed in the sense that both have been computed after the most recent
call to optimizer.step(). So they both backpropagate through the versions
of model’s parameters that they were computed with (so no inplace error.)

Best.

K. Frank

Hi @ptrblck!

At issue is the use of a single Linear. When you backpropagate through
Linear.weight in order to compute Linear.weight’s gradient, you
don’t need the value of Linear.weight itself, so it doesn’t’ matter that
it’s been modified inplace. (x doesn’t carry requires_grad = True.)

But you do need the value of Linear.weight to backpropagate through
the input to Linear.

Try:

model = torch.nn.Sequential (
    torch.nn.Linear (1, 1),
    torch.nn.Linear (1, 1)
)

This will trigger Sergio’s error.

Now you do need the value of the second Linear’s weight because you
need to backpropagate through the first Linear to compute gradients for
the first Linear’s parameters.

Best.

K. Frank

This makes perfect sense. Thank you @KFrank and @ptrblck !

Ah yes, of course! Thanks for pointing out this mistake :slight_smile: