# Why are gradients given by Pytorch 0.4.0 and 0.4.1 are different when backward?

``````import torch
import torch.nn as nn

print('x====: {}'.format(x))
print('w====: {}'.format(w))

def f(x):
x = x.cuda()
# return x*x*x.sum()

sum_losses = 0

for i in range(2):

loss = f(x)
# print(i, loss)

sum_losses += loss
loss.backward(torch.ones_like(loss), retain_graph=True)

x = x + update
print('x-:{}'.format(x))

sum_losses.backward()

w = w + w_update
print('w====: {}'.format(w))
``````

Pytorch 0.4.1 print as follow:

``````x====: tensor([1.], requires_grad=True)
``````

Pytorch0.4.0 print as follow:

``````x====: tensor([ 1.])
w====: tensor([ 0.2000])
x-:tensor([ 0.6000])
x-:tensor([ 0.3600])
w====: tensor([ 0.8240])
``````

I change function optimizer as follow and the problem is solved, but still confused.

``````def optimizer(grad):
``````

It looks like your colleague posted the same question here.
I would ask to only keep one topic alive and keep all answers there.

Hi,

Have you tried running this with a more recent version of pytorch.
Which result is the expected one?

Hi, I’ ve tried running the code with Pytorch 1.0.0. The print resuls are the same with Pytorch 0.4.1.

Running your code with the latest pytorch raises:

``````x-:tensor([0.6000], grad_fn=<AddBackward0>)
Traceback (most recent call last):
File "foo.py", line 28, in <module>
loss.backward(torch.ones_like(loss), retain_graph=True)
File "torch/tensor.py", line 195, in backward
File "torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor ] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
``````

The problem is that the way the `x.grad` is updated during the forward in old version was done with the unsafe `.data`. And so the inplace operation was not properly detected.
This is a good example where the use of `.data` is dangerous and should be replaced by `.detach()` or `with torch.no_grad()`.

You can fix your current code by doing this in old versions:

``````def optimizer(grad):
``````

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

I want to know why report the error in the the latest pytorch.
Why this error caused by update `x.grad`, I think this caused by `x = x + update`. Because `loss.backward()` calculate the grad of `x`. So I think change `x` cause the error.

Why I change to `return w * (-grad)`, the code also can execution.

I fix the function optimizer as follows and it works, but i’m still confused why they works

``````def optimizer(grad):
``````

HI,

The problem is that the multiplication needs the values of its operands to compute the backpass.
If for any reason this operand is modified inplace, the computed gradient will be wrong.
The old implementation that was using `.data` for gradient accumulation was not notifying the autograd of the inplace operation and thus the gradient were wrong.
The new implementation that uses `torch.no_grad()` does notify the autograd and so throws an error.

Both my suggestion with `.clone()` and your change to do `-grad` make a copy of `grad` before passing it to the multiplication. Thus when `grad` is modified inplace, it does not modify the value needed by the multiplication to compute its backward.
So this will compute the correct gradient.

Thanks. I want to make sure my understanding is correct.
1.
In cycle, the first loss calculate by `loss = f(x)`, but the following loss calculate by

``````update = optimizer(x.grad)
x = x + update
loss = f(x)
``````

Every tensor keeps a version counter. When exec `loss.backward`, the Function save new version counter of the tensor, and check in backward.
when exec `loss.backward`, calculate the grad from bottom, so `x.grad` change. but not yet calculate the grad about `-w*grad`, so the program report error.

In cycle, autograd records a new graph every time. If I change `loss.backward(torch.ones_like(loss), retain_graph=True)` to `loss.backward(torch.ones_like(loss), retain_graph=False)`, the program not report error in cycle. But sum_losses rely on all subgraph that recorded in cycle. So `sum_losses.backward() ` will report error.

Hi,

I’m not sure to understand what you mean here.
The main point is that `loss.backward()` used to modified `x.grad` in an unsafe way. So if an operation used `x.grad`, then the wrong behavior you observe will happen.

I am trying to transform a code written in PyTorch 0.2 to PyTorch version 1.7.
Somehow I managed to run the code in PyTorch 1.7 but the results are different from the original code.
I believe the loss computed in these two versions is different.

The link of the code is written in PyTorch 0.2:

Can you suggest changes in gen_step() and dis_step() in the base_model.py file in order to transform it to 1.7?
Thanks