Index error using custom backbone on FasterRCNN

AdilZouitine · May 5, 2020, 1:00pm

Hello everyone, I’m following the object detection tutorial you’ll find here.

However I want to use in backbone a vgg11/Resnet18 instead of the MobileNet v2 of the tutorial.

For the VGG11 I replace :

backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280

by:

backbone = torchvision.models.vgg11_bn(pretrained=True, progress=True).features
backbone.out_channels = 512

For the ResNet18:

resnet18 = torchvision.models.resnet18(pretrained=True)
backbone = torch.nn.Sequential(nn.Sequential(*list(resnet18.children())[:-2]))
backbone.out_channels = 512

I have the same error for both backbones:

If you have an explanation, I’m interested .

Torch : 1.5.0
Torchvision : 0.6.0

Thank you.

emcp · May 5, 2020, 7:47pm

here is how I have been getting through with custom backbone networks for resnet 101


class ModelResnet101FasterRCNN(FasterRCNN):
    def __init__(self, data_conf, model_conf):
        print("Creating model")
        backbone_nn = torchvision.models.__dict__[model_conf["hyperParameters"]["net"]](pretrained=True)
        # This line above yields the equivalent of ...
        # backbone_nn = torchvision.models.resnet101(pretrained=True)
        # OR
        # backbone_nn = torchvision.models.wide_resnet101_2(pretrained=True)

        modules = list(backbone_nn.children())[:-1]  # delete the last fc layer.
        backbone_nn = nn.Sequential(*modules)
        for param in backbone_nn.parameters():
            param.requires_grad = False

        # FasterRCNN needs to know the number of
        # output channels in a backbone. For resnet101, it's 2048
        backbone_nn.out_channels = 2048

AdilZouitine · May 7, 2020, 12:19pm

Hello, I tried your solution but it doesn’t work.
Here is the link google collab where I used your solution: here

Thank you for your help.

emcp · May 7, 2020, 3:22pm

looks like you are not done yet… when using a custom backbone neural network the tutorial states you need to also instantiate the pooling components and attach them to your model… I am not seeing that in your notebook

from the tutorial you need…

# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)

# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

Jimmy_Hall · May 14, 2020, 10:22pm

I got the same error following the tutorial exactly, but the FasterRCNN documentation:

https://github.com/pytorch/vision/blob/master/torchvision/models/detection/faster_rcnn.py

suggests that the featmap_names in the MultiScaleROIAlign should be a character, ‘0’, not an integer. I replaced

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)

with

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
                                                output_size=7,
                                                sampling_ratio=2)

and the error on foward pass went away. I haven’t verified that the model finetunes appropriately though.

Hope this helps.

emcp · May 19, 2020, 5:38am

My work is here https://github.com/JRGEMCP/bootstrap-pytorch-torchvision-fasterrcnn

I PRd a hint in the documentation regarding that exact problem… but I am still not getting good results

AdilZouitine · May 25, 2020, 8:27am

Hey, this is the solution.
This should be notified in the documentation.

Thanks to all of you

emcp · May 25, 2020, 7:04pm

the PR is waiting here https://github.com/pytorch/tutorials/pull/979