What's the difference if I add clone() to every feature in a multiple branches network?

When I read the code in this

class MultiLabelDemo(nn.Module):
    def __init__(self):
        super(MultiLabelDemo, self).__init__()
        self.main_block = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=96, out_channels=96, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=96, out_channels=96, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )
        self.tail_block1 = nn.Sequential(
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )
        self.tail_block2 = nn.Sequential(
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )

    def forward(self, data):
        x = self.main_block(data)
        y1 = self.tail_block1(x)  
        y2 = self.tail_block2(x)

        return y1, y2

When I have multiple branches, is it the same if I use clone() to one of them? or all of them?

    def forward(self, data):
        x = self.main_block(data)
        y1 = self.tail_block1(x)  
        y2 = self.tail_block2(x**.clone()**)
        y3 = self.tail_block3(x**.clone()**)
        y1 = y1 * (y2+y3)
        return y1

In my code, I found if I dont use colone, I will have the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

Hi,

The only difference it makes is that it copies some tensors. So output value and gradient will be the same.
One difference that makes though is that if you try to modify the tensor inplace, this is not allowed if it’s original value is needed. If you clone, then the inplace operation is done on the cloned tensor and it’s not a problem.

Hi

When I don’t use clone(), I found if my tail_block start with

relu(inplace=True)

I will have the error like this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

if the tail_block start with bn or conv, it’s ok.

But if I use clone() for every x, I won’t get any errors.