For progressive networks should I slice off the pretrained layer or copy over its weight whats the difference?

inkplay · July 31, 2018, 12:28am

For example say my starting layer is

Conv2d(3, 256, 1) -> Conv2d(256, 512, 4, 2, 1)

and I want to grow my layer on the left side of it so the next layer’s structure should be

Conv2d(3, 128, 1) -> Conv2d(128, 256, 4, 2, 1) -> Conv2d(256, 512, 4, 2, 1).

In this case should I slice the old conv layer or copy over the state_dict? Currently I have a working model where I slice off the old layer, destroy the old network, and add the 2 new layers plus the old copy in a list then unpack in nn.Sequential() so I can make a new network. Finally I would grab that new network’s params and feed it to a new optimizer. My model works and it produces good results but I am not sure if I am following best practice.