Question about source code in faster_rcnn.py

brifuture · July 16, 2020, 6:47am

Hi, I have a question about the default RPNHead constructor, following is the source code from torchvision==0.6.1, located at models/detection/faster_rcnn.py around line 186:

if rpn_head is None:
    rpn_head = RPNHead(
        out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
    )

My question is: Why the RPNHead is constructed with anchors as rpn_anchor_generator.num_anchors_per_location()[0] instead of sum(rpn_anchor_generator.num_anchors_per_location())?

For example: if I constructed rpn_anchor_generator with following code:

anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
rpn_anchor_generator = AnchorGenerator(
    anchor_sizes, aspect_ratios
)

I should get 15 anchors per location on the original Image? but with rpn_anchor_generator.num_anchors_per_location()[0] I could get only 3 anchors per location.

brifuture · July 26, 2020, 7:17am

Fine, I figured out the reason, here is the source code from torchvision.models.detection.rpn.RPNHead:

    def forward(self, x):
        # type: (List[Tensor])
        # print([i.shape for i in x])
        logits = []
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
            print("RPN", self.cls_logits(t).shape)
            print("RPN", self.bbox_pred(t).shape)
        return logits, bbox_reg

The anchor_sizes defined above anchor_sizes = ((32,), (64,), (128,), (256,), (512,)) will be iterated by the for loop, so the RPNHead only needs rpn_anchor_generator.num_anchors_per_location()[0] as the num_anchors because all anchor_sizes share the same ratios.