Re-using layers in model?

ajhanwar · June 17, 2019, 7:03pm

Hi all, I am learning about PyTorch and have been practicing through implementing the YOLOv3 model.

The model itself has layers which are “repeated” in a sense. For example, we repeat the process of doing the following several times:

nn.Conv2d(256, 512, kernel_size=1, stride=1, padding=1)
nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1)
nn.Conv2d(256, 512, kernel_size=1, stride=1, padding=1)
nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1)
...

In such a situation would we need to explicitly list out each convolutional layer within the model definition? I understand each layer has an associated weight tensor and so I am not sure if we can simply re-use the layers.

tymokvo · June 17, 2019, 7:32pm

Welcome to the forums!

Yes it is definitely possible! Whether or not you want to do it is a design decision. Broadly, it will lower memory cost while also reducing the degrees of freedom of the model.
One of the important contributions of DenseNet was the discovery that (for classification) you can pass the parameter tensors of early layers as inputs to deeper layers to reduce the memory cost of the model and encourage the model to learn features that are applicable at several scales.

You could share the parameters themselves like:

c = nn.Conv2d(3, 3, (3, 3))

im = torch.randn(1, 3, 8, 8, requires_grad=True)

o = c(im)
o = c(o)

And a render of its backprop graph looks like:

Or a DenseNet idea like:

c = nn.Conv2d(3, 3, (3, 3), 1, 1, bias=False)
u = nn.UpsamplingBilinear2d(scale_factor=3)

im = torch.randn(1, 3, 9, 9, requires_grad=True)
fms = u(c.weight)

o = c(im)
o = torch.cat([fms, o])

Which looks like:

I am not sure if YOLO uses parameter sharing. I do seem to remember that YOLOv3 does use residual connections. But it would be an interesting experiment to see if the parameter sharing improves performance!

ajhanwar · June 17, 2019, 8:35pm

Thank you so much for the detailed explanation!
Just to confirm my understanding, reusing layers would effectively count as parameter sharing (using same weights)? And if so, is there a way to prevent such an occurrence without having to explicitly list out every layer?

ptrblck · June 17, 2019, 10:51pm

Yes, reusing a layer would use the same parameters in its operation.
If you are dealing with a lot of (simple to initialize) layers, you could try to create them in a loop and append each one to a nn.ModuleList.
Later on you could pass the layers from this list to an nn.Sequential module or just call it again in a loop in your forward method.