How does pytorch faster rcnn model share features for RPN and fast R-CNN?

In the original faster rcnn paper, the authors describe object detection as a 2 stage module, whereby in the first stage, region proposals are generated by a CNN. In the second stage, the fast r-cnn model is used that takes as input these region proposals, having another CNN network by itself to carry out the object detection from the proposals.

In their paper, they describe a few approaches (Alternating training, Approximate joint training, Non-approximate joint training, 4-Step Alternating Training) that lead to a single shared CNN network. However, most of the approaches described involve training two separate CNN’s before sharing the CNN layers eventually.

Which approach described in the paper does the faster r-cnn model use to carry out it’s training? I would like to have an intuition of what happens under the hood during the training.