Confusion re: Faster-RCNN Mobilenet Example

Hi all, I have some confusion regarding the mobilenet example provided in the F-RCNN Code.

In particular, copying the code as given in the example:

import torch
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=False).features

# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],

#put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)

x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)

And running the code from a fresh docker (built last night) , gives the following error trace:

Traceback (most recent call last):
  File "", line 38, in <module>
    predictions = model(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/", line 71, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/", line 756, in forward
    box_features = self.box_roi_pool(features, proposals, image_shapes)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/", line 188, in forward
    self.setup_scales(x_filtered, image_shapes)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/", line 161, in setup_scales
    lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
IndexError: list index out of range

Making either of the following changes lets the evaluate at least run through:

model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator) #using the default roi_pooler


roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],

Looking at the typing notes for the pooler, it seems like the input should be a list[str], and it’s not clear to me how passing featmap_names=[0] is supposed to be handled.

Ultimately, I’m not sure how to build an F-RCNN following the mobile net example (e.g. using a backbone that returns a tensor rather than a dict). Is the example provided in the code actually correct? If not, would setting featmap_names=[‘0’] provide correct behavior, or is it necessary to wrap the outputs up into a singleton dict?

Thanks for any insight you might have!

1 Like

Just a quick update:

It looks as though at some point the MultiScaleRoIAlign constructor moved from using a list of indices to a list of keys for featmap_names, while the docstrings weren’t updated to reflect this.

This has been corrected in this commit. The correct behavior was being emulated by

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],

Can you tell the version of Pytorch. Is it 1.5?

I’d have to go back and double check the dates of the various versions to be sure, at the time I was using the latest stable build of both pytorch and pytorch vision. But, I believe this was using pytorch 1.5 and pytorch vision .4?

Were you able to get the correct results out of the model? Because as it seems that the targets are always modified with simple forward pass. Check this . If possible can you tell me am I doing something wrong or is it a bug? You can comment on that post. Please excuse me for any mistakes. I am new to this.

maybe you can choose torch1.4.0 and torchvision0.5.0 . this is a url for download:

Will give it a try. Thanks bud

Thank you, Mertens.
This definitely works for torch 1.7

I’m on PyTorch 1.9.0 and Torch Vision 0.10.0 and [0] (which is still in the tutorial) doesn’t work, while ['0'] does.