I thought that in python x = x - a was the exactly the same as x-=a. But apparently not with tensor :
In the following code the gradient is not set to None after updating the parameter (x-=a case):
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(“x1.grad=”,x1.grad)
with torch.no_grad():
x1 -= 0.01 * x1.grad # <=============================
print(“x1.grad=”,x1.grad)
x1.grad.zero_()
print(“x1.grad=”,x1.grad)
output:
x1.grad= tensor([36.])
x1.grad= tensor([36.])
x1.grad= tensor([0.])
but the following code the gradient is set to None (x = x - a) :
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(“x1.grad=”,x1.grad)
with torch.no_grad():
x1 = x1 - 0.01 * x1.grad # <=============================
print(“x1.grad=”,x1.grad)
x1.grad.zero_()
output:
Traceback (most recent call last):
File “D:/gCloud/GoogleDrive/colabai/notes/torch/grad/zforum.py”, line 14, in
x1.grad.zero_() # AttributeError: ‘NoneType’ object has no attribute ‘zero_’
AttributeError: ‘NoneType’ object has no attribute ‘zero_’
x1.grad= tensor([36.])
x1.grad= None
Both solutions are ok for me. I just would like to understand this behaviour? Thanks.
Because in python when you use x-=
you are using in-place operations
I don’t know how do the rules work in these case but basically when you do x-=lr*grad
you are still pointing to the original tensor whose data has been in-place modified.
In the 2nd case you are overwritting a local variable x1
so that when you try to print x1 it no longer points to the one you defined at the beggining but a “totally different variable” and the one at the beggining is out of scope.
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 -= 0.01 * x1.grad
print(x1._version)
print(x1.is_leaf)
import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 =x1- 0.01 * x1.grad
print(x1._version)
print(x1.is_leaf)
if you run this, you will find that it’s not possible to run the 1st case
x1 -= 0.01 * x1.grad
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
but you can run the second case in which x1 won’t be a leaf variable
if you add torch.no_grad() it will disable autograd engine and first case will be runable because there is no longer grad tracking and second case will be a leaf variable as cos of the same reason
1 Like
Thanks.
print(id(x1))
also shows your point clearly.