Hi all, I have some confusion regarding the mobilenet example provided in the F-RCNN Code.
In particular, copying the code as given in the example:
import torch import torchvision from torchvision.models.detection import FasterRCNN from torchvision.models.detection.rpn import AnchorGenerator # load a pre-trained model for classification and return # only the features backbone = torchvision.models.mobilenet_v2(pretrained=False).features # FasterRCNN needs to know the number of # output channels in a backbone. For mobilenet_v2, it's 1280 # so we need to add it here backbone.out_channels = 1280 # let's make the RPN generate 5 x 3 anchors per spatial # location, with 5 different sizes and 3 different aspect # ratios. We have a Tuple[Tuple[int]] because each feature # map could potentially have different sizes and # aspect ratios anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),)) # let's define what are the feature maps that we will # use to perform the region of interest cropping, as well as # the size of the crop after rescaling. # if your backbone returns a Tensor, featmap_names is expected to # be . More generally, the backbone should return an # OrderedDict[Tensor], and in featmap_names you can choose which # feature maps to use. roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=, output_size=7, sampling_ratio=2) #put the pieces together inside a FasterRCNN model model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler) model.eval() x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] predictions = model(x)
And running the code from a fresh docker (built last night) , gives the following error trace:
Traceback (most recent call last): File "mobileNetExample.py", line 38, in <module> predictions = model(x) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 71, in forward detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 756, in forward box_features = self.box_roi_pool(features, proposals, image_shapes) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 188, in forward self.setup_scales(x_filtered, image_shapes) File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 161, in setup_scales lvl_min = -torch.log2(torch.tensor(scales, dtype=torch.float32)).item() IndexError: list index out of range
Making either of the following changes lets the evaluate at least run through:
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator) #using the default roi_pooler
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], output_size=7, sampling_ratio=2)
Looking at the typing notes for the pooler, it seems like the input should be a list[str], and it’s not clear to me how passing featmap_names= is supposed to be handled.
Ultimately, I’m not sure how to build an F-RCNN following the mobile net example (e.g. using a backbone that returns a tensor rather than a dict). Is the example provided in the code actually correct? If not, would setting featmap_names=[‘0’] provide correct behavior, or is it necessary to wrap the outputs up into a singleton dict?
Thanks for any insight you might have!