Backprop in Branched Network and reusing variable names


(Apoorv Agnihotri) #1

Below is the high-level code. Would there be issues when I try to reuse variable names below?

NOTE: Jump Below, Issue got cleared, some ideas needed to avoid GPU Memory wastage.

variables: main and little

  • First pass is some normal convolutions
# Conv
main = self.conv1(x)
main = self.bn1(main)
main = self.relu(main)

in second pass, we form 2 branches
and add them at the end of both branches.

# pass 2 | bL-module
little = main
main = self.conv2(main)
main self.bn2(main)
main = self.relu(main)
little = self.littleblock(little)
main += little

In 3rd pass, we again form 2 branches and pass the inputs and the outputs are again added, inside transition1.

# pass 3 | `ResBlockB`s & `ResBlockL`s  planes = 64
little = main
main = self.big_layer1(main)
little = self.little_layer1(little)
main = self.transition1(main, little)

def transition1(x1, x2):
    assert(x1.shape == x2.shape)
    out = x1 + x2 # merge via add
    out = conv(out)

    return out

In 4th pass, again the output from last module is copied and passed to 2 branches, and finally merged with transition2.

# pass 4 | planes = 128
little = main
main = self.big_layer2(main)
little = self.little_layer2(little)
main = self.transition2(main, little)

Would there be issues with backpropagation when I try to optimize this network? AFIK, There should not be any issues but I wanted to confirm that by reusing main and little variables, would I be overwriting things (like computation graphs that would be required to calculate the grads) that should not be overwritten thus causing unwanted side-effects.

I am sorry if this is a basic question, I just wanted some sort of confirmation about the absence of any issue.


(Apoorv Agnihotri) #2

The issue got cleared, even pytorch started ranting that you can’t just overwrite variables, therefore using different variable names, might be hogging on GPU memory, could you guys confirm and give some solution if it is the case?

Thanks.
Below is the updated code.

def forward(self, x):
    # Conv
    base1 = self.conv1(x)
    base1 = self.bn1(base1)
    base1 = self.relu(base1)

    # pass 2 | bL-module
    little1 = base1; big1 = base1;
    big1 = self.conv2(big1)
    big1 = self.bn2(big1)
    big1 = self.relu(big1)
    little1 = self.littleblock(little1)
    assert (big1.shape == little1.shape)
    base2 = little1 + big1

    # pass 3 | `ResBlockB`s & `ResBlockL`s  planes = 64
    little2 = base2; big2 = base2;
    big2 = self.big_layer1(big2)
    little2 = self.little_layer1(little2)
    # print ('1st layer passed')
    base3 = self.transition1([big2, little2])

    # pass 4 | planes = 128
    little3 = base3; big3 = base3;
    big3 = self.big_layer2(big3)
    little3 = self.little_layer2(little3)
    # print ('2nd layer passed')
    base4 = self.transition2([big3, little3])

    # pass 5 | planes = 256
    little4 = base4; big4 = base4;
    big4 = self.big_layer3(big4)
    little4 = self.little_layer3(little4)
    # print ('3rd layer passed')
    out = self.transition3([big4, little4])

    # pass 6 | Res_Block | planes = 512
    out = self.res_layer1(out)

    # avg pooling
    out = self.avgpool(out)
    out = out.view(out.size(0), -1)
    out = self.fc(out)

    return out

#3

Hi,

I share something I known before with you, but I don’t know if it will help you.
What I tried as follows:

intermediate = torch.randn(1,3,3,3)
copy_intermediate = intermediate

print(id(intermediate))
print(id(copy_intermediate))
# the same id
print(intermediate is copy_intermediate))
# True

It seems that assignment operation makes variables point to the same address ( exclude mutable variables like Integer and String), so I think you could not worry about the memory consumption.

Did you encounter high GPU wastage using the code posted?


(Apoorv Agnihotri) #4

Hi, Thanks a lot for the insight about same id.
No, I did’t see any GPU wastage as I had not ran the model but was just concerned that it could potentially.

Further, is this assignment operation pointing to the same address, a feature of PyTorch, or is it the default idea followed in python?


#5

I am glad to help you!

I think it is a feature in Python, and you could run your networks in practice and to see if there is a GPU memory wastage or not, if our conclusion is wrong please correct me.

Here is the thread on stackoverflow which helped me a lot few days ago.