I thought that in python x = x - a was the exactly the same as x-=a. But apparently not with tensor :

In the following code the gradient is not set to None after updating the parameter (x-=a case):

import torch

x1 = torch.tensor([2.0], requires_grad=True)

x2 = torch.tensor([3.0], requires_grad=True)

y = (x1 * x2).pow(2)

y.backward()

print(“x1.grad=”,x1.grad)

with torch.no_grad():

x1 -= 0.01 * x1.grad # <=============================

print(“x1.grad=”,x1.grad)

x1.grad.zero_()

print(“x1.grad=”,x1.grad)

output:

x1.grad= tensor([36.])

x1.grad= tensor([36.])

x1.grad= tensor([0.])

but the following code the gradient is set to None (x = x - a) :

import torch

x1 = torch.tensor([2.0], requires_grad=True)

x2 = torch.tensor([3.0], requires_grad=True)

y = (x1 * x2).pow(2)

y.backward()

print(“x1.grad=”,x1.grad)

with torch.no_grad():

x1 = x1 - 0.01 * x1.grad # <=============================

print(“x1.grad=”,x1.grad)

x1.grad.zero_()

output:

Traceback (most recent call last):

File “D:/gCloud/GoogleDrive/colabai/notes/torch/grad/zforum.py”, line 14, in

x1.grad.zero_() # AttributeError: ‘NoneType’ object has no attribute ‘zero_’

AttributeError: ‘NoneType’ object has no attribute ‘zero_’

x1.grad= tensor([36.])

x1.grad= None

Both solutions are ok for me. I just would like to understand this behaviour? Thanks.

Because in python when you use `x-=`

you are using in-place operations

I don’t know how do the rules work in these case but basically when you do `x-=lr*grad`

you are still pointing to the original tensor whose data has been in-place modified.

In the 2nd case you are overwritting a local variable `x1`

so that when you try to print x1 it no longer points to the one you defined at the beggining but a “totally different variable” and the one at the beggining is out of scope.

```
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 -= 0.01 * x1.grad
print(x1._version)
print(x1.is_leaf)
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 =x1- 0.01 * x1.grad
print(x1._version)
print(x1.is_leaf)
```

if you run this, you will find that it’s not possible to run the 1st case

```
x1 -= 0.01 * x1.grad
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
```

but you can run the second case in which x1 won’t be a leaf variable

if you add torch.no_grad() it will disable autograd engine and first case will be runable because there is no longer grad tracking and second case will be a leaf variable as cos of the same reason

1 Like

Thanks.

print(id(x1))

also shows your point clearly.