I used the fine tune mask rcnn tutorial to train a model to differentiate weeds from crops using the CFWID dataset (available on github), and the results look a little bit wired, even though I am aware that the number of training samples in the dataset is far from enough for the model to work (60 in my case).
The model predicts some regions as both weeds and crops, and these regions have large overlap (IoU). I was wondering if there is any mistake I made in the data format processing? Does mask rcnn perform some non-maximal suppression to avoid such predtctions? Thank you in advance.