Fine tuning Mask RCNN results


I used the fine tune mask rcnn tutorial to train a model to differentiate weeds from crops using the CFWID dataset (available on github), and the results look a little bit wired, even though I am aware that the number of training samples in the dataset is far from enough for the model to work (60 in my case). Screenshot from 2020-11-25 22-20-59

The model predicts some regions as both weeds and crops, and these regions have large overlap (IoU). I was wondering if there is any mistake I made in the data format processing? Does mask rcnn perform some non-maximal suppression to avoid such predtctions? Thank you in advance.