Should I use clone when I am using a feature in multiple branch?

seyeeet · July 15, 2021, 9:40pm

So I have this confusion for months now, should I use clone in the folliwng code for x?
I am not sure if it is important or not, but I appreciate if someone can help me understand it.
Is it gonna effect the learning if I go with any of the following samples
Lets say I have a model that somewhere in it I have to use branches of an input and compute different output. I am wondering if I need to use clone or not.

Which one of the following models is correct?

class Model(nn.Module):
  def __init__(self,):
    super(Model, self).__init__()
    self.conv = nn.Conv2d(3,6,3)
    self.conv_2 = nn.Conv2d(6,2,3)
    self.conv_3= nn.Conv2d(6,20,3) 
  def forward(self,input):
    x = self.conv(input)
    x_2 = self.conv_2(x)
    x_3 = self.conv_3(x)
    return x_2,x_3

class Model(nn.Module):
  def __init__(self,):
    super(Model, self).__init__()
    self.conv = nn.Conv2d(3,6,3)
    self.conv_2 = nn.Conv2d(6,2,3)
    self.conv_3= nn.Conv2d(6,20,3) 
  def forward(self,input):
    x = self.conv(input)
    x_2 = self.conv_2(x.clone())
    x_3 = self.conv_3(x)
    return x_2,x_3

class Model(nn.Module):
  def __init__(self,):
    super(Model, self).__init__()
    self.conv = nn.Conv2d(3,6,3)
    self.conv_2 = nn.Conv2d(6,2,3)
    self.conv_3= nn.Conv2d(6,20,3) 
  def forward(self,input):
    x = self.conv(input)
    x_2 = self.conv_2(x.clone())
    x_3 = self.conv_3(x.clone())
    return x_2,x_3

ptrblck · July 16, 2021, 6:20am

You would have to clone() the input to the separate branches, in case they are using inplace operations in the first branch, as it would yield unexpected results in the second (and further) branch as seen here:

lin = nn.Linear(10, 10)
branch1 = nn.ReLU(inplace=True)
branch2 = nn.Sigmoid()

# without clone
x = torch.randn(1, 10)
x = lin(x)
print(x)
> tensor([[-0.1115, -0.4749, -0.0076, -0.1911, -0.0425, -0.7667, -1.2244, -0.2573,
            1.6056, -0.1653]], grad_fn=<AddmmBackward>)

x1 = branch1(x)
print(x) # changed inplace!
> tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.6056,
           0.0000]], grad_fn=<ReluBackward1>)

print(x1)
> tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.6056,
           0.0000]], grad_fn=<ReluBackward1>)

x2 = branch2(x) # unexpected results
print(x2)
> tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.8328,
           0.5000]], grad_fn=<SigmoidBackward>)

# with clone
x = torch.randn(1, 10)
x = lin(x)
print(x)
> tensor([[-0.1678, -0.0383,  0.3958,  0.0245, -0.3356,  0.2320,  0.0355, -0.1278,
            0.1390,  0.5725]], grad_fn=<AddmmBackward>)

x1 = branch1(x.clone())
print(x) # not changed
> tensor([[-0.1678, -0.0383,  0.3958,  0.0245, -0.3356,  0.2320,  0.0355, -0.1278,
            0.1390,  0.5725]], grad_fn=<AddmmBackward>)

print(x1)
> tensor([[0.0000, 0.0000, 0.3958, 0.0245, 0.0000, 0.2320, 0.0355, 0.0000, 0.1390,
           0.5725]], grad_fn=<ReluBackward1>)

x2 = branch2(x)
print(x2)
> tensor([[0.4581, 0.4904, 0.5977, 0.5061, 0.4169, 0.5578, 0.5089, 0.4681, 0.5347,
           0.6393]], grad_fn=<SigmoidBackward>)

seyeeet · July 20, 2021, 4:03pm

Thank you for clarification.
Would it cause any issue if I clone to both first and second branch instead of just second branch? I mean technically it should be equal to the case that we clone only in the second branch… (or maybe not?)
Can you please enlighten me about the difference

ptrblck · July 20, 2021, 4:48pm

If you are unsure which branch might manipulate the input inplace, you could clone both, but would thus use a bit more memory.

char-t · July 21, 2021, 7:51am

Hi! The model in my current project uses lots of branches and this thread caught my interest. Is it ok to not clone() the input to any branches if we have taken care to avoid inplace operations? Motivation for this would be reducing memory usage.

ptrblck · July 21, 2021, 8:04am

Yes, it should be fine. Just make sure not to reassign the same variable name:

x = self.module1(x)
x = self.module2(x)

in case you want to use the same “input” x in self.module2.

seyeeet · July 22, 2021, 4:54pm

@char-t can you please clarify how to taken care to avoid inplace operations? without using the clone?

@char-t @ptrblck
do you mean something like this is fine to do and it is similar to using the clone?:

x1 = input
x2 = input
x1 = self.module1(x1)
x2 = self.module2(x2)

and it is okay to do it like this?

ptrblck · July 22, 2021, 9:15pm

No, this won’t work, as x1 and x2 would be a view to input.
You could use my posted code snippet for a quick check:

lin = nn.Linear(10, 10)
branch1 = nn.ReLU(inplace=True)
branch2 = nn.Sigmoid()

# without clone
x = torch.randn(1, 10)
x = lin(x)
print(x)
> tensor([[-0.9229, -0.3109, -0.3517,  0.1832,  0.2353, -0.5287, -0.9106, -0.1691,
           -0.6657,  0.1129]], grad_fn=<AddmmBackward>)

x1 = x
x2 = x
x1_out = branch1(x1)
print(x) # changed inplace!
> tensor([[0.0000, 0.0000, 0.0000, 0.1832, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000,
           0.1129]], grad_fn=<ReluBackward1>)
print(x1) # changed inplace!
> tensor([[0.0000, 0.0000, 0.0000, 0.1832, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000,
           0.1129]], grad_fn=<ReluBackward1>)
print(x1_out)
> tensor([[0.0000, 0.0000, 0.0000, 0.1832, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000,
           0.1129]], grad_fn=<ReluBackward1>)
print(x2)
> tensor([[0.0000, 0.0000, 0.0000, 0.1832, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000,
           0.1129]], grad_fn=<ReluBackward1>)