Hi all, I have some confusion regarding the mobilenet example provided in the F-RCNN Code.
In particular, copying the code as given in the example:
import torch
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=False).features
# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280
# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
#put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)
And running the code from a fresh docker (built last night) , gives the following error trace:
Traceback (most recent call last):
File "mobileNetExample.py", line 38, in <module>
predictions = model(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 71, in forward
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 756, in forward
box_features = self.box_roi_pool(features, proposals, image_shapes)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 188, in forward
self.setup_scales(x_filtered, image_shapes)
File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 161, in setup_scales
lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
IndexError: list index out of range
Making either of the following changes lets the evaluate at least run through:
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator) #using the default roi_pooler
OR
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
output_size=7,
sampling_ratio=2)
Looking at the typing notes for the pooler, it seems like the input should be a list[str], and it’s not clear to me how passing featmap_names=[0] is supposed to be handled.
Ultimately, I’m not sure how to build an F-RCNN following the mobile net example (e.g. using a backbone that returns a tensor rather than a dict). Is the example provided in the code actually correct? If not, would setting featmap_names=[‘0’] provide correct behavior, or is it necessary to wrap the outputs up into a singleton dict?
Thanks for any insight you might have!