x-:tensor([0.6000], grad_fn=<AddBackward0>)
x-.grad: None
Traceback (most recent call last):
File "foo.py", line 28, in <module>
loss.backward(torch.ones_like(loss), retain_graph=True)
File "torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The problem is that the way the x.grad is updated during the forward in old version was done with the unsafe .data. And so the inplace operation was not properly detected.
This is a good example where the use of .data is dangerous and should be replaced by .detach() or with torch.no_grad().

You can fix your current code by doing this in old versions:

I want to know why report the error in the the latest pytorch.
Why this error caused by update x.grad, I think this caused by x = x + update. Because loss.backward() calculate the grad of x. So I think change x cause the error.

Why I change to return w * (-grad), the code also can execution.

The problem is that the multiplication needs the values of its operands to compute the backpass.
If for any reason this operand is modified inplace, the computed gradient will be wrong.
The old implementation that was using .data for gradient accumulation was not notifying the autograd of the inplace operation and thus the gradient were wrong.
The new implementation that uses torch.no_grad() does notify the autograd and so throws an error.

Both my suggestion with .clone() and your change to do -grad make a copy of grad before passing it to the multiplication. Thus when grad is modified inplace, it does not modify the value needed by the multiplication to compute its backward.
So this will compute the correct gradient.

Thanks. I want to make sure my understanding is correct.
1.
In cycle, the first loss calculate by loss = f(x), but the following loss calculate by

update = optimizer(x.grad)
x = x + update
loss = f(x)

Every tensor keeps a version counter. When exec loss.backward, the Function save new version counter of the tensor, and check in backward.
when exec loss.backward, calculate the grad from bottom, so x.grad change. but not yet calculate the grad about -w*grad, so the program report error.

In cycle, autograd records a new graph every time. If I change loss.backward(torch.ones_like(loss), retain_graph=True) to loss.backward(torch.ones_like(loss), retain_graph=False), the program not report error in cycle. But sum_losses rely on all subgraph that recorded in cycle. So sum_losses.backward() will report error.

I’m not sure to understand what you mean here.
The main point is that loss.backward() used to modified x.grad in an unsafe way. So if an operation used x.grad, then the wrong behavior you observe will happen.

I am trying to transform a code written in PyTorch 0.2 to PyTorch version 1.7.
Somehow I managed to run the code in PyTorch 1.7 but the results are different from the original code.
I believe the loss computed in these two versions is different.

The link of the code is written in PyTorch 0.2:

Can you suggest changes in gen_step() and dis_step() in the base_model.py file in order to transform it to 1.7?
Thanks