Gradient computation error

bingqing_liu · March 10, 2022, 11:58am

embedding=nn.Parameter(torch.rand(2,3))
d=nn.Parameter(torch.rand(3,3))
user_embeddings=embedding.clone()
user_embedding_input = user_embeddings[0] 
a=user_embedding_input*3#option1
print(a)
a=torch.matmul(d,user_embedding_input)#option2
print(a)
user_embeddings[0]=a
loss=a.sum()
loss.backward()

i found that use option2 will cause this error,but option1 works well,why?

one of the variables needed for gradient computation has been modified by an inplace operation

KFrank · March 10, 2022, 7:29pm

Hi Bingqing!

user_embedding_input is a view into user embeddings and references
the same underlying data.

During backpropagation, the gradient of a with respect to
user_embedding_input is 3 and doesn’t depend on
user_embedding_input, so autograd doesn’t care whether
whether user_embedding_input “has been modified inplace.”

Here, the gradient of a with respect to d is (morally speaking)
user_embedding_input, so autograd will complain during
backpropagation if user_embedding_input has been modified.

Because user_embedding_input and user_embeddings refer to
the same data, this line does modify user_embedding_input. So
autograd does complain during option2’s backpropagation about
user_embedding_input having been modified.

Two things have to happen for the error to occur. First, something
has to modify user_embedding_input, and this happens because
user_embedding_input is a view into user_embeddings, which
you do modify.

And second, user_embedding_input itself has to be used during
backpropagation to calculate gradients. user_embedding_input
is used in option2’s gradient calculation, but not in options1’s, so
you only get the error for option2.

Best.

K. Frank

bingqing_liu · March 11, 2022, 12:59am

Got it!!So kind of you,thank you!!!

bingqing_liu · March 11, 2022, 2:58am

embedding=nn.Parameter(torch.rand(2,3))
d=nn.Parameter(torch.rand(3,3))
user_embeddings=embedding.clone()
user_embedding_input = user_embeddings[[0],:] 
a=torch.matmul(user_embedding_input,d)
print(a)
user_embeddings[[0],:]=a
loss=a.sum()
loss.backward()

According your reply,i have known [0] is a selectbackward,so i change this to [[0],:] to become a indexbackward,and it works well.

but i still have a question,why i add a question line which means i have changed the user_embedding_input,it’s a inplace operation,but it still works?and the printed d.grad equals to user_embedding_input before assignment.

embedding=nn.Parameter(torch.rand(2,3))
d=nn.Parameter(torch.rand(3,3))
user_embeddings=embedding.clone()
user_embedding_input = user_embeddings[[0],:] 
a=torch.matmul(user_embedding_input,d)
print(a)
user_embeddings[[0],:]=a
user_embedding_input=3#question line
loss=a.sum()
loss.backward()