Faster R-CNN input question

diningeachox · January 29, 2021, 8:18pm

Hi Everyone,

This is my first post here and I’m new to pytorch in general. So I’m wondering if you guys can help me understand the Faster R-CNN documentation: TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 1.7.1 documentation

I’m trying to train an object detection model for heart conditions/anamolies on chest x-rays. I expect a large percentage of the data will be healthy people (i.e. no objects to detect). In this case, what should be the bounding box? As I understand, the torchvision model still expects a nontrivial bounding box even if no object is present, is that right? If so, would I just take the bounding box to be the whole image? Or something else?

Thank you in advance for any tips or assistance.

JamesDickens · January 30, 2021, 7:03am

The faster rcnn training process expects a percentage of positive anchors as well as negative anchors of the region proposal batch fed to the roi heads, and if the bounding box dicts are empty it will throw an error if I remember correctly.

I think you will have to tweak the Generalized RCNN code as well as the RPN code in torchvision to deal with the scenario of having no anchors with an IOU of a ground truth bounding larger than the threshold. The RPN will still output a batch of region proposals whose objecteness scores are closest to 1.

I encountered this issue since the MS COCO 2014 object detection dataset has some images with no boxes. Most pre-processing I saw simply discarded those images, but in your case that probably doesn’t make sense since you want to be able to output something like “no anomalies” for an image, not to mention the fact that in order to eliminate false positives your model needs a decent amount of healthy person images.

What you want ultimately for an image with no anomalies is for the region proposal network to learn, in these scenarios, that all anchors should have objectness value to 0, and for any of the region proposals output by the rpn, that the classifier head should predict the background class for all of them (where the regression loss for both the rpn and the roi heads are disabled in this case).

Alternatively, what would be an interesting experiment would be to have either

A cascade approach of two models in which the first model computes a binary classification task to see if the image has ANY issues. From there a second model could take images that pass through a confidence threshold from the previous model and compute faster-rcnn like object detection
A third network head to the faster-rcnn model that works for the above binary classification task, this is probably not a good idea but it would be fun to try.

I definitely wouldn’t recommend using the whole image as a bounding box. I guess you would give it the label of 0, the background class.