Here is what you are trying to prove. Let T1 and T2 be two tensors created by the torch.randn()
function, provided the same random seed, with the only difference them between being the moment at which requieres_grad
is set to True
. That is,
seed = 42
# use seed to create the first random tensor
torch.random.manual_seed(seed)
T1 = torch.randn(2,5, requires_grad = True)
# use the same seed to create the second random tensor
torch.random.manual_seed(seed)
T2 = torch.randn(2,5)
T2.requieres_grad_(True) # notice the inplace operation
Now, let us perform the exact same operations in both tensors T1
and T2
. In this way, once we call the backward()
method with some different tensor of the same shape as input (in this case I chose a tensor of all ones), both tensors T1
and T2
should have the same value at their grad
atribute.
# for T1
x1 = 3 * T1
y1 = x1 + 1
z1 = y1 * y1
# for T2
x2 = 3 * T2
y2 = x2 + 1
z2 = y2 * y2
# calling the backward method
z1.backward(torch.ones_like(z1))
z2.backward(torch.ones_like(z2))
# printing the .grad for T1 and T2
print(T1.grad)
print(T2.grad)
#tensor([[ 12.0604, 8.3186, 10.2203, 10.1460, -14.2114],
# [ 2.6461, 45.7476, -5.4839, 14.3098, 10.8123]])
#tensor([[ 12.0604, 8.3186, 10.2203, 10.1460, -14.2114],
# [ 2.6461, 45.7476, -5.4839, 14.3098, 10.8123]])
You get the same value for both.
HOWEVER, the intriguing text inside the link you are referring to (url), aims for something different. Let me paste the code that originated the confussion,
weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
For this case, it does matter where requieres_grad
is set to True
. If you try to set it at the first line, like this
weights = torch.randn(784, 10, requires_grad=True) / math.sqrt(784)
after arbitatry operations are performed on weights
and the backward()
method is called, you will see a warning from PyTorch saying that yout are trying to access the grad
attribute of a non leaf tensor, so weights.grad
is set to None
. Why? Becasue in such case, weights
does not follow the definition of a leaf tensor: A leaf Variable is a variable that no operation tracked by the autograd engine created it (see this post for further examples). So, what is keeping weights
from being a leaf variable? The division by sqrt(784)
.
Try it yourself and let me know!