When running the below code, I get the following issue:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The code below performs two forward passes, with one backward propagation:

for epoch in range(10):
count = 0
for i in range(0, len(X), batch_size):
Xbatch = X[i:i+batch_size]
y_pred = model(Xbatch)
ybatch = y[i:i+batch_size]
loss = loss_fn(y_pred, ybatch)
if count % 2 == 0:
if count != 0:
optimizer.zero_grad()
# Pytorch doesn't like this
((loss_1 + loss_2) / 2).backward()
optimizer.step()
loss_1 = loss
else:
loss_2 = loss
count += 1

but if I slightly alter the code like so, it works fine:

for epoch in range(10):
count = 0
for i in range(0, len(X), batch_size):
Xbatch = X[i:i+batch_size]
y_pred = model(Xbatch)
ybatch = y[i:i+batch_size]
loss = loss_fn(y_pred, ybatch)
if (count+1) % 2 == 0:
optimizer.zero_grad()
((loss_1 + loss) / 2).backward()
optimizer.step()
else:
loss_1 = loss
count += 1

optimizer.step() modifies the parameters of your model inplace, and, in
your first version, causes the inplace-modification error.

For count = 0, you compute loss and assign it to loss_1. loss_1 is
linked to the parameters of model by the computation graph.

For count = 1, you compute a new value of loss and assign it to loss_2.

For count = 2, you compute a new value of loss, backpropagate loss_1 and loss_2, call optimizer.step(), which modifies the
parameters of model inplace, and assign loss to loss_1 At this point loss_1 depends on model’s old parameter values because it was
computed (as loss) prior to optimizer.step().

For count = 3, you compute a new value of loss and assign it to loss_2 (but do not call .backward() nor optimizer.step()).

Then for count = 4, you call .backward() on loss_1 + loss_2.
Backpropagating loss_1 backpropagates through the model parameters,
but loss_1 depends on the old values of the model parameters,
triggering the inplace-modification error.

In this case you either backpropagate the newly-computed loss and loss_1 and call optimizer.step() (which modifies your model inplace)
or you assign the newly-computed loss to loss_1. But when you iterate
further and next call .backward() both loss_1 and loss are newly
computed in the sense that both have been computed after the most recent
call to optimizer.step(). So they both backpropagate through the versions
of model’s parameters that they were computed with (so no inplace error.)

At issue is the use of a single Linear. When you backpropagate through Linear.weightin order to compute Linear.weight’s gradient, you
don’t need the value of Linear.weight itself, so it doesn’t’ matter that
it’s been modified inplace. (x doesn’t carry requires_grad = True.)

But you do need the value of Linear.weight to backpropagate through
the input to Linear.

Now you do need the value of the second Linear’s weight because you
need to backpropagate through the first Linear to compute gradients for
the first Linear’s parameters.