If there are C classes, each RoI outputs C score and bounding box predictions (two tensors, size (1, C)
and (2,C)
resp.) at test stage (postprocess_detections
method). Non-max suppression is done independently of the class (i.e. boxes overlapping more than NMS are kept if they are different classes). But the normalization function is not class-independent:
pred_scores = F.softmax(class_logits, -1)
So if there are two positive classes, pred_scores
vector will be, e.g. [0.9, 0.1]
, and at some point both of these scores will be compared to box_score_thresh
. Obviously one of them is very likely to be rejected. Therefore, I don’t quite understand this implementation. It should be either:
pred_scores = F.sigmoid(class_logits, -1)
preds = torch.nonzero(pred_scores.sigmoid()>box_score_thresh)
to compute the scores independently, or
preds = class_logits.max(-1)
preds.values[preds.indices>0].sigmoid()>box_score_thresh
to extract the best prediction from every RoI. Then the predictions will be genuinly independent. I think it needs to be re-implemented or at least added as an argument to choose from. Mask predictions are done independently in this way.