MaskRCNN Mask prediction question

I am studying MaskRCNN and Detectron2. My understanding is the mask head and the bbox regression heads make independent predictions and trained in parallel with a joint loss function.

The RPN gives an approximate bbox which gets feed into both heads. As far as I understand, the mask head shouldn’t have knowledge about the bbox regression head’s final predicted bbox.

But I notice the predicted mask doesn’t extend outside of the predicted bbox (is this even true?). Why does the predicted mask neatly fits inside the final predicted bbox? It seems to me the predict mask should be bounded by the rough RPN bbox which could extend beyond the final bbox.

Please help me understand.