How does clone interact with backpropagation?

I was wondering, how does .clone() function interact with backpropagation?

For example I had to pieces of code and I don’t understand what their difference is, can someone explain me the difference (maybe so I understand why clone is even needed for backprop to work “properly”, whatever properly means):

E.g1:

import torch
from torch.autograd import Variable
x = Variable(torch.rand(2,1),requires_grad=False) #data set
w = Variable(torch.rand(2,2),requires_grad=True) #first layer
v = Variable(torch.rand(2,2),requires_grad=True) #last layer
w[:,1] = v[:,0]
y = torch.matmul(v, torch.matmul(w,x) )

vs

import torch
from torch.autograd import Variable
x = Variable(torch.rand(2,1),requires_grad=False) #data set
w = Variable(torch.rand(2,2),requires_grad=True) #first layer
v = Variable(torch.rand(2,2),requires_grad=True) #last layer
w[:,1] = v[:,0]
y = torch.matmul(v, torch.matmul(w.clone(),x) )

why do we need clone in the second one? What does “backprop work properly” mean?

1 Like

Both of your scripts fail for me.

Short reason:

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

If you want to perform in-place operations (like w[:, 1] = v[:, 0]), you can clone the variable before the in-place operation. This makes it not be a leaf variable anymore.

w1 = w.clone()
w1[:,1] = v[:,0]
y = torch.matmul(v, torch.matmul(w1,x) )

Using in-place Variables is very tricky in many cases. A lot of frameworks doesn’t support them, so they just perform copies instead. PyTorch support in-place operations, but because other operations can require the content of the Variable to perform backpropagation, you can’t modify it inplace or else you will have wrong gradients.

3 Likes

If I get a copy of a Variable and extract a slice of the copy, then the gradient only back prop through the sliced dimension?
what if this Variable is the model parameters?

1 Like

Funny, I was wondering exactly what the title of the question is asking. How does backprop interact to clone?

I build this simple script:

## print clone backward
import torch
​
a = torch.tensor([1,2,3.], requires_grad=True)
c = a.sigmoid()
c_cloned = c.clone()
​
print(f'c_cloned: {c_cloned}')

and got this:

c_cloned: tensor([0.7311, 0.8808, 0.9526], grad_fn=<CloneBackward>)

but notice that the gradient recorded the clone operation (i.e. grad_fn=<CloneBackward>) which I thought was very weird.

So my question is…what does backprop react to a clone operation? What does it even mean backprop through a clone operation?

1 Like

All incoming gradients to the cloned tensor will be propagated to the original tensor as seen here:

x = torch.randn(2, 2, requires_grad=True)
y = x.clone()
y.retain_grad()

z = y**2
z.mean().backward()

print(y.grad)
print(x.grad)
6 Likes