Confusion re: Faster-RCNN Mobilenet Example

Mertens · January 13, 2020, 5:35pm

Hi all, I have some confusion regarding the mobilenet example provided in the F-RCNN Code.

In particular, copying the code as given in the example:

import torch
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=False).features

# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)

#put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)

model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)

And running the code from a fresh docker (built last night) , gives the following error trace:

Traceback (most recent call last):
  File "mobileNetExample.py", line 38, in <module>
    predictions = model(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 71, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 756, in forward
    box_features = self.box_roi_pool(features, proposals, image_shapes)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 188, in forward
    self.setup_scales(x_filtered, image_shapes)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 161, in setup_scales
    lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
IndexError: list index out of range

Making either of the following changes lets the evaluate at least run through:

model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator) #using the default roi_pooler

OR

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
                                                output_size=7,
                                                sampling_ratio=2)

Looking at the typing notes for the pooler, it seems like the input should be a list[str], and it’s not clear to me how passing featmap_names=[0] is supposed to be handled.

Ultimately, I’m not sure how to build an F-RCNN following the mobile net example (e.g. using a backbone that returns a tensor rather than a dict). Is the example provided in the code actually correct? If not, would setting featmap_names=[‘0’] provide correct behavior, or is it necessary to wrap the outputs up into a singleton dict?

Thanks for any insight you might have!

Mertens · January 23, 2020, 9:02pm

Just a quick update:

It looks as though at some point the MultiScaleRoIAlign constructor moved from using a list of indices to a list of keys for featmap_names, while the docstrings weren’t updated to reflect this.

This has been corrected in this commit. The correct behavior was being emulated by

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
                                                output_size=7,
                                                sampling_ratio=2)

chhaya_kumar_das · May 13, 2020, 11:04am

Can you tell the version of Pytorch. Is it 1.5?

Mertens · May 13, 2020, 3:56pm

I’d have to go back and double check the dates of the various versions to be sure, at the time I was using the latest stable build of both pytorch and pytorch vision. But, I believe this was using pytorch 1.5 and pytorch vision .4?

chhaya_kumar_das · May 15, 2020, 7:34am

Were you able to get the correct results out of the model? Because as it seems that the targets are always modified with simple forward pass. Check this . If possible can you tell me am I doing something wrong or is it a bug? You can comment on that post. Please excuse me for any mistakes. I am new to this.

111317 · May 19, 2020, 12:00pm

maybe you can choose torch1.4.0 and torchvision0.5.0 . this is a url for download:https://download.pytorch.org/whl/torch_stable.html

chhaya_kumar_das · May 19, 2020, 12:02pm

Will give it a try. Thanks bud

bulatnv · November 11, 2020, 5:28am

Thank you, Mertens.
This definitely works for torch 1.7

douglasrizzo · June 21, 2021, 6:24am

I’m on PyTorch 1.9.0 and Torch Vision 0.10.0 and [0] (which is still in the tutorial) doesn’t work, while ['0'] does.