Implementation of Faster RCNN with Resnet50 + FPN as backbone

Hello everyone,

I have a question regarding the implementation of Faster RCNN with ResNet50 + FPN as backbone. I am using the implementation given by Pytorch:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 3)

I have also heard that the standard anchor sizes are 32, 64, 128, 256, 512 and the aspect ratios are 0.5, 1, 2.

But when I look at the RPN header, it says the following:

RPNHead(
(conv): Sequential(
(0): Conv2dNormActivation(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
)
)
(cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)

However, if I use the same anchor sizes and ratios but with the following program, cls_logit’s output size increases from 3 to 15 and bbox_pred’s output increases from 12 to 60:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
sizes = ((64,), (156,), (300,), (512,), (800,))
 aspect_ratios = ((0.5, 1.0, 2.0),) * len(sizes)
 anchor_generator = AnchorGenerator(sizes=sizes, aspect_ratios=aspect_ratios)
 model.rpn.anchor_generator = anchor_generator
 model.rpn.head = RPNHead(256, anchor_generator.num_anchors_per_location()[0])
 in_features = model.roi_heads.box_predictor.cls_score.in_features
 model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes=3)

The second program is used to test other sizes. My question is: why is the num_anchors only 3 in the given implementation of FasterRCNN?

I would be very happy if you can help me.