Bug in autograd/Pytorch?

Amine08 · June 25, 2021, 10:01pm

Hey,

after analysing my last topic 124823, i am convinced that this behavior of autograd/pytorch is not okey. In this code the variable c does not get updated in contrast to a and b after autograd step:

import torch
import torch.optim as optim

# Define a variable w to optimize
w = torch.tensor([1.,2,3,4],requires_grad=True)

# Define variables a and c

a=torch.tensor([2])
c=torch.tensor([3,3])

# Affect values of w to a,b and c

a=w[0]
b=w[0:2]
c[0]=w[0]

# Define optimizer

opt = optim.Adam([w]) 

# Define loss function
loss = w.sum()

# Calculate the gradients
loss.backward()

# Print old values of w, a, b, c
print("w: ",w)
print("a: ",a)
print("b: ",b)
print("c: ",c)

#Optimize

opt.step()



print("=======")
print("After training")
print("=======")

# Print new values of w, a, b, c
print("w: ",w) 
print("a: ",a) # should be updated as well
print("b: ",b) # should be updated as well
print("c: ",c) # should be updated as well

eqy · June 26, 2021, 12:08am

I don’t think this has anything to do with autograd, but rather just the fact that when you assign a variable directly (rather than an index range with a preallocated tensor), you essentially get a view of the same memory that has the same underlying storage. For example, if you add

print("w ptr: ", w.storage().data_ptr())
print("a ptr: ", a.storage().data_ptr())
print("b ptr: ", b.storage().data_ptr())
print("c ptr: ", c.storage().data_ptr())

after the code, you will see something like:

w ptr:  16615296
a ptr:  16615296
b ptr:  16615296
c ptr:  57524864

which explains why the variables change the way that they do.
For example, if you change a=w[0] to a[:]=w[0] you can see this change.

w:  tensor([0.9990, 1.9990, 2.9990, 3.9990], requires_grad=True)
a:  tensor([1])
b:  tensor([0.9990, 1.9990], grad_fn=<AsStridedBackward>)
c:  tensor([1, 3])
w ptr:  42055552
a ptr:  41892352
b ptr:  42055552
c ptr:  82965120

Amine08 · June 26, 2021, 11:50am

hmm i see! Is there any solution to update c without defining it again?

eqy · July 1, 2021, 6:06pm

It depends on what your purpose is here. In general, c won’t be updated in this situation because if you look at the direction of dependencies it looks like “c depends on w,” “gradient depends on w.” In order for c to be affected by w’s gradient, this ordering needs to be changed so that it is “w depends on c,” “gradient depends on w.” Otherwise, there is no reason to update c’s gradient as w doesn’t depend on it!

Here’s an example of a change to do this, but it may not be exactly what you want:

c=torch.tensor([3.0,], requires_grad=True)
temp = torch.cat((w[:1] + c[:1], w[1:]))
opt = optim.Adam([w, c])
loss = temp.sum()
loss.backward()
...

w:  tensor([0.9990, 1.9990, 2.9990, 3.9990], requires_grad=True)
a:  tensor(0.9990, grad_fn=<AsStridedBackward>)
b:  tensor([0.9990, 1.9990], grad_fn=<AsStridedBackward>)
c:  tensor([2.9990], requires_grad=True)

Amine08 · July 2, 2021, 9:40pm

This is an other case. In my case ,c, depends on ,w, , because c[0]=w[0]. So i want that when ,w, gets updated then c must be also get updated automatically, i.e. c[0]=w[0] st.t w[0] is the new value.

eqy · July 2, 2021, 9:45pm

Right, but unless c is used in some computation that affects the loss, it won’t be updated.