How to train dataset (image with mask, image without mask )?

Dataset consists of an image with a ground truth mask and an image without a ground truth mask.

I want to use a mask rcnn network to learn masks for images with groud truth masks, and only classification and regression for images without ground truth masks.

Can I train using a mask rcnn network?

If I create a branch in the network, I don’t understand how to create it.