Upgrading parameters (form scratch)

Gilles_Jack · December 31, 2019, 6:11am

I thought that in python x = x - a was the exactly the same as x-=a. But apparently not with tensor :

In the following code the gradient is not set to None after updating the parameter (x-=a case):

import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(“x1.grad=”,x1.grad)
with torch.no_grad():
x1 -= 0.01 * x1.grad # <=============================

print(“x1.grad=”,x1.grad)
x1.grad.zero_()
print(“x1.grad=”,x1.grad)

output:

x1.grad= tensor([36.])
x1.grad= tensor([36.])
x1.grad= tensor([0.])

but the following code the gradient is set to None (x = x - a) :

import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(“x1.grad=”,x1.grad)
with torch.no_grad():
x1 = x1 - 0.01 * x1.grad # <=============================

print(“x1.grad=”,x1.grad)
x1.grad.zero_()

output:

Traceback (most recent call last):
File “D:/gCloud/GoogleDrive/colabai/notes/torch/grad/zforum.py”, line 14, in
x1.grad.zero_() # AttributeError: ‘NoneType’ object has no attribute ‘zero_’
AttributeError: ‘NoneType’ object has no attribute ‘zero_’
x1.grad= tensor([36.])
x1.grad= None

Both solutions are ok for me. I just would like to understand this behaviour? Thanks.

JuanFMontesinos · December 31, 2019, 7:57am

Because in python when you use x-= you are using in-place operations
I don’t know how do the rules work in these case but basically when you do x-=lr*grad you are still pointing to the original tensor whose data has been in-place modified.

In the 2nd case you are overwritting a local variable x1 so that when you try to print x1 it no longer points to the one you defined at the beggining but a “totally different variable” and the one at the beggining is out of scope.

import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 -= 0.01 * x1.grad
print(x1._version)
print(x1.is_leaf)

import torch
x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([3.0], requires_grad=True)
y = (x1 * x2).pow(2)
y.backward()
print(x1._version)
print(x1.is_leaf)
x1 =x1- 0.01 * x1.grad

print(x1._version)
print(x1.is_leaf)

if you run this, you will find that it’s not possible to run the 1st case

    x1 -= 0.01 * x1.grad
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

but you can run the second case in which x1 won’t be a leaf variable

if you add torch.no_grad() it will disable autograd engine and first case will be runable because there is no longer grad tracking and second case will be a leaf variable as cos of the same reason

Gilles_Jack · January 2, 2020, 3:26am

Thanks.

print(id(x1))

also shows your point clearly.