What's the difference if I add clone() to every feature in a multiple branches network?

Mata_Fu · September 30, 2018, 1:29am

When I read the code in this

class MultiLabelDemo(nn.Module):
    def __init__(self):
        super(MultiLabelDemo, self).__init__()
        self.main_block = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=96, out_channels=96, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=96, out_channels=96, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )
        self.tail_block1 = nn.Sequential(
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )
        self.tail_block2 = nn.Sequential(
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )

    def forward(self, data):
        x = self.main_block(data)
        y1 = self.tail_block1(x)  
        y2 = self.tail_block2(x)

        return y1, y2

When I have multiple branches, is it the same if I use clone() to one of them? or all of them?

    def forward(self, data):
        x = self.main_block(data)
        y1 = self.tail_block1(x)  
        y2 = self.tail_block2(x**.clone()**)
        y3 = self.tail_block3(x**.clone()**)
        y1 = y1 * (y2+y3)
        return y1

In my code, I found if I dont use colone, I will have the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

albanD · September 30, 2018, 10:14am

Hi,

The only difference it makes is that it copies some tensors. So output value and gradient will be the same.
One difference that makes though is that if you try to modify the tensor inplace, this is not allowed if it’s original value is needed. If you clone, then the inplace operation is done on the cloned tensor and it’s not a problem.

Mata_Fu · September 30, 2018, 9:08pm

Hi

When I don’t use clone(), I found if my tail_block start with

relu(inplace=True)

I will have the error like this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

if the tail_block start with bn or conv, it’s ok.

But if I use clone() for every x, I won’t get any errors.