FPN with VGG16 Backbone for FasterRCNN

Hi, I would like to use the VGG16 Backbone in combination with FPN in the Faster R-CNN object detector.
I load the VGG16 as follows

backbone = torchvision.models.vgg16()
backbone = backbone.features[:-1]
backbone.out_channels = 512

Now I would like to attach a FPN to the VGG as follows:

backbone = BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels)

which I found in the documentation. Can anybody help to construct the return_layers, in_channels and out_channels for the VGG16 Example? I did only find some for ResNet, but I cannot get it running for VGG currently.
Would be glad about any type of help here.
Thanks in advance.

As I like to have answers here in the Forum I did get it work by myself. Your first have to create a named version of the VGG16 Network Backbone and then construct the FPN around it. Following Code snipped should work:

from torchvision.models.detection.backbone_utils import BackboneWithFPN
import torch
import torch.nn as nn

class VGG_named(nn.Module):
    def __init__(self):
        super().__init__()

        self.layer_1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_2 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_3 = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_4 = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.layer_5 = nn.Sequential(
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True)
        )
        
    def forward(self,x):
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        x = self.layer_4(x)
        x = self.layer_5(x)
        return x

if __name__ == "__main__":
    vgg16 = VGG_named()
    vgg16.out_channels = 512

    in_channels_list = [128, 256, 512, 512]
    return_layers = {'layer_2': '0', 'layer_3': '1', 'layer_4': '2', 'layer_5': '3'}

    out_channels = 256

    backbone = BackboneWithFPN(vgg16, return_layers, in_channels_list, out_channels)
    img = torch.randn((4,3,512,512))
    out = backbone(img)

Hi,

Did you manage to use this to run with Faster R-CNN? I’m looking for a similar implementation, but Id need the VGG to be pre-trained. any idea if this is possible?

You can convert the weights from the unnamed to the named VGG. Than you can load the backbone into FasterRCNN and train it with FPN.