FPN with VGG16 Backbone for FasterRCNN

SeucheAchat9115 · October 10, 2022, 6:33am

Hi, I would like to use the VGG16 Backbone in combination with FPN in the Faster R-CNN object detector.
I load the VGG16 as follows

backbone = torchvision.models.vgg16()
backbone = backbone.features[:-1]
backbone.out_channels = 512

Now I would like to attach a FPN to the VGG as follows:

backbone = BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels)

which I found in the documentation. Can anybody help to construct the return_layers, in_channels and out_channels for the VGG16 Example? I did only find some for ResNet, but I cannot get it running for VGG currently.
Would be glad about any type of help here.
Thanks in advance.

SeucheAchat9115 · October 10, 2022, 8:17am

As I like to have answers here in the Forum I did get it work by myself. Your first have to create a named version of the VGG16 Network Backbone and then construct the FPN around it. Following Code snipped should work:

from torchvision.models.detection.backbone_utils import BackboneWithFPN
import torch
import torch.nn as nn

class VGG_named(nn.Module):
    def __init__(self):
        super().__init__()

        self.layer_1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_2 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_3 = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_4 = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_5 = nn.Sequential(
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True)
        )
        
    def forward(self,x):
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        x = self.layer_4(x)
        x = self.layer_5(x)
        return x

if __name__ == "__main__":
    vgg16 = VGG_named()
    vgg16.out_channels = 512

    in_channels_list = [128, 256, 512, 512]
    return_layers = {'layer_2': '0', 'layer_3': '1', 'layer_4': '2', 'layer_5': '3'}

    out_channels = 256

    backbone = BackboneWithFPN(vgg16, return_layers, in_channels_list, out_channels)
    img = torch.randn((4,3,512,512))
    out = backbone(img)

JarlLemmens · October 13, 2022, 12:33pm

Hi,

Did you manage to use this to run with Faster R-CNN? I’m looking for a similar implementation, but Id need the VGG to be pre-trained. any idea if this is possible?

SeucheAchat9115 · November 18, 2022, 9:35am

You can convert the weights from the unnamed to the named VGG. Than you can load the backbone into FasterRCNN and train it with FPN.