Hello community, this is my first time posting here
The mask-rcnn implementation outputs a masks, bounding boxes and class labels. For my use-case I would like to predict multiple attributes besides a class label (three to be specifc). For instance if I was detecting bikes I would like to further predict (sport bike, etc…). I can foresee a three approaches:
- Keep single class/attribute prediction per bbox, this would involve creating classes from all the possible combinations of the attributes (bike_sport_red, bike_leisure_blue, etc…) which I don’t find very appealing (lack of scalability)
- Modify/reimplement mask-rcnn to accomodate for multiple categories per bounding box. I am not sure how much effort this would entail, but I am wary of tinkering with it.
- Create my own Module which takes as member variable mask-rcnn module and connect the feature vector layers to my own classifiers.
Talking out loud the last option seems the best approach. I would appreciate any feedback or guidance.
Also if anyone knows of a paper or implementation which does this I would greatly appreciate knowing about them.