Creating and training a custom network based on pretrained Faster-RCNN

I’m trying to create a custom network with pretrained fasterrcnn_resnet50_fpn from torchvision. The aim is to insert new layers between fpn and rpn. For this, I create a new nn.Module class and divide the original model into two as shown in below snippet.

from torchvision.models.detection import fasterrcnn_resnet50_fpn

fasterRcnn = fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=15, pretrained_backbone=True)


class CustomFasterRcnn(nn.Module):

  def __init__(self):
    super(CustomFasterRcnn, self).__init__()

    self.resnet50WithFpn = nn.Sequential(*list(fasterRcnn.children())[0:2])
    self.RPN = nn.Sequential(*list(fasterRcnn.children())[2:])

  def forward(self, x):

    x = self.resnet50WithFpn(x)
    x = self.RPN(x)

    return x

As long as what I saw from internet, its forward method takes 2 arguments, self and x. However, in official docs of pytorch, torchvision.models — Torchvision 0.11.0 documentation, it is written that model takes two arguments during training which are images, targets.

Currently, it is giving error becase of lack of argument which is target. So, should I change the forward method to get 3 arguments, self, x and targets? If so, where should I pass targets argument in method?

Note: I am going to initialize new layers in init method and use these layers in the forward method in between self.resnet50WithFpn(x) and self.RPN(x)

Splitting a model into nn.Sequential containers would work for simple models, which are initializing and using all submodules in a sequential way without any usage of the functional API.
For quite simple models such as ResNets this approach would already miss this torch.flatten operation.
More advanced models such as FasterRCNN will most likely break using this approach.
You can find the forward definition of the base class here.

So are you saying that there is no easy approach to modify FasterRCNN to split its parts and add new layers this way?

It seems I need to inherit from its base class to modify its behavior. Am I right?

Or can you suggest a new way to do such a thing?

What I actually want to achieve is to add new conv2d-relu layers in between fpn and rpn to get proposals to be classified by head.

Yes, I don’t think splitting the entire model into nn.Sequential container would easily work (certain, “simple” submodules could work).

Yes, I believe this would be the proper way. To do so, check the second link, which points to the base class.

Assuming these layers won’t change the activation shape (which would thus allow to add them without modifying other layers too), you could try to replace the fpn with an nn.Sequential module containing the fpn as well as the additional layers.