If there are C classes, each RoI outputs C score and bounding box predictions (two tensors, size
(1, C) and
(2,C) resp.) at test stage (
postprocess_detections method). Non-max suppression is done independently of the class (i.e. boxes overlapping more than NMS are kept if they are different classes). But the normalization function is not class-independent:
pred_scores = F.softmax(class_logits, -1)
So if there are two positive classes,
pred_scores vector will be, e.g.
[0.9, 0.1], and at some point both of these scores will be compared to
box_score_thresh. Obviously one of them is very likely to be rejected. Therefore, I don’t quite understand this implementation. It should be either:
pred_scores = F.sigmoid(class_logits, -1) preds = torch.nonzero(pred_scores.sigmoid()>box_score_thresh)
to compute the scores independently, or
preds = class_logits.max(-1) preds.values[preds.indices>0].sigmoid()>box_score_thresh
to extract the best prediction from every RoI. Then the predictions will be genuinly independent. I think it needs to be re-implemented or at least added as an argument to choose from. Mask predictions are done independently in this way.