Torchvision Mask RCNN - Max_GT_Instances

I am trying Mask RCNN based on the torchvision tutorial and am getting some wired results. The problem is very simple, detect the person in the image such that each image has only one person (I am trying this as a proof of concept). In the test images that has one person; the model (trained for 300 epochs) gave four labels and the corresponding masks where overlapped; the model, which should output one label and one mask (of the person), outputs four labels and four overlapped masks of the person; one showed full body silhouette; another showed lower body part silhouette; two more showed left and right body part silhouettes.

Is there a way to set Max_GT_Instances prior to training? If not, one can pick label with the highest score, but that will be done after training.

Max_GT_instances is the maximum number of instances that can be detected in one image. If the number of instances in the images are limited, this can be set to maximum number of instances that can occur in the image. This helps in reduction of false positives and reduces the training time.