Why weights/bias are initialized like this for RPN?

In the RPN() function why are the weights initialized like this? Am I missing something from the paper? and why this why not use kalming initilization and all.
I don’t even understand why bias is set as 0.
Can someone help?

for layer in self.children():
            torch.nn.init.normal_(layer.weight, std=0.01)  # type: ignore[arg-type]
            torch.nn.init.constant_(layer.bias, 0)  # type: ignore[arg-type]

The RPN function
Am I missing something obvious?

Another doubt inside RPN() here

self.bbox_pred = nn.Conv2d(in_channels, num_anchors * 4, kernel_size=1, stride=1)

Here why are we doing ‘num_anchors * 4’?
from where is this 4 coming?
whereas in detectron2, in the RPN it is defined as

self.anchor_deltas = nn.Conv2d(
            in_channels, num_cell_anchors * box_dim, kernel_size=1, stride=1

here its “num_cell_anchors * box_dim”
(detectron2/rpn.py at 5e2a1ecccd228227c5a605c0a98d58e1b2db3640 · facebookresearch/detectron2 · GitHub)
I don’t know whats happening here.