Multi-label muli-class Faster RCNN for object detection

MultiLabelGuy · March 14, 2023, 11:36am

I am currently trying to detect objects that have multiple labels. Each of the labels has different and multiple classes. In other words, I want to detect object instances that have multiple attributes assigned to them. For example, in an image with multiple cars, we want to detect each car instance as well as its corresponding attributes such as colour, number of wheels, etc. Currently, I only predict the individual attributes, but this results in ambiguity as to which object the individual attributes belong to.

ptrblck · March 15, 2023, 5:18am

Your use case sounds like instance segmentation and e.g. Mask R-CNN might be a useful model to check out. This example shows how to visualize some outputs of this model.

MultiLabelGuy · March 15, 2023, 10:29am

Thanks for the answer, I wonder if it is possible to assign multiple labels to the individual masks? Otherwise I would have to estimate the segmentation of the labels themselves and then reassemble them by comparing their regions to estimate the individual objects. Since I have to compare the regions of the individual masks, the performance would probably depend to a large extent on the pixel accuracy of the segments, especially in the case of occlusions.

Michal_Bogacz · March 15, 2023, 1:53pm

In models like Mask-RCNN you can define multiple heads, used for different purposes, those are extensions of backbone, which produces embeddings of the photo, and objects on this photo.

If you look at the basic implementation of M-rcnn from Pytorch repo, you can see, it is though on bboxes, labels and masks. After training it returns all three based on the input data.

You can add additional heads for additional purpouses.

github.com

pytorch/vision/blob/main/torchvision/models/detection/mask_rcnn.py#:~:text=During training, the,0.5 (mask >= 0.5)

from collections import OrderedDict
from typing import Any, Callable, Optional

from torch import nn
from torchvision.ops import MultiScaleRoIAlign

from ...ops import misc as misc_nn_ops
from ...transforms._presets import ObjectDetection
from .._api import register_model, Weights, WeightsEnum
from .._meta import _COCO_CATEGORIES
from .._utils import _ovewrite_value_param, handle_legacy_interface
from ..resnet import resnet50, ResNet50_Weights
from ._utils import overwrite_eps
from .backbone_utils import _resnet_fpn_extractor, _validate_trainable_layers
from .faster_rcnn import _default_anchorgen, FasterRCNN, FastRCNNConvFCHead, RPNHead


__all__ = [
    "MaskRCNN",
    "MaskRCNN_ResNet50_FPN_Weights",

This file has been truncated. show original

MultiLabelGuy · March 21, 2023, 3:49pm

Hello again Thank you for your answer! Unfortunately, I still haven’t come up with a suitable solution. I had also considered using several heads, but the problem is that I have a variable number of objects in the image. The structure of my problem is as follows:

The image consists of a variable number of cars.
for each car i want to extract the following information: car number, car colour, car length, car wall, car roof, wheel_count, load_obj1, load_obj2, load_obj3.

In my recent approach i have implemented some kind of template matching to check wether the individual attribute mask are included in one of the whole car mask. Therefore, I calculate similarity values between each of the attribute masks and each of the vehicle masks and then select the affiliations based on these similarity values. The similarity scores are calculated as follows:

def get_similarity_score(mask, whole_car_mask):
    # determine to which degree mask is included in whole car mask
    # calculate similarity value by summing up all values in mask where mask is smaller than whole car mask
    # and summing up all values of whole car mask where mask is higher than whole car mask
    similarity = mask[mask <= whole_car_mask].sum() + whole_car_mask[mask > whole_car_mask].sum()
    similarity = similarity / mask.sum()

However, this approach does not perform the best and is very errorprone. My model is currently defined as follows:

weights = models.detection.MaskRCNN_ResNet50_FPN_V2_Weights.DEFAULT
model = models.detection.maskrcnn_resnet50_fpn_v2(weights=weights,box_score_thresh=0.9)
# for predicting masks
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = <b>256</b>
# define a new head for the detector with required number of classes, 22 for the label specific classes and 20 as the upper bound for the number of cars which can be present in a scene
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,hidden_layer, 22 + 20 )
# for predicting boxes
# get the number of input features
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features,22 + 20 )