Index error using custom backbone on FasterRCNN

Hello everyone, I’m following the object detection tutorial you’ll find here.

However I want to use in backbone a vgg11/Resnet18 instead of the MobileNet v2 of the tutorial.

For the VGG11 I replace :

backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280


backbone = torchvision.models.vgg11_bn(pretrained=True, progress=True).features
backbone.out_channels = 512

For the ResNet18:

resnet18 = torchvision.models.resnet18(pretrained=True)
backbone = torch.nn.Sequential(nn.Sequential(*list(resnet18.children())[:-2]))
backbone.out_channels = 512

I have the same error for both backbones:

If you have an explanation, I’m interested .

Torch : 1.5.0
Torchvision : 0.6.0

Thank you.

here is how I have been getting through with custom backbone networks for resnet 101

class ModelResnet101FasterRCNN(FasterRCNN):
    def __init__(self, data_conf, model_conf):
        print("Creating model")
        backbone_nn = torchvision.models.__dict__[model_conf["hyperParameters"]["net"]](pretrained=True)
        # This line above yields the equivalent of ...
        # backbone_nn = torchvision.models.resnet101(pretrained=True)
        # OR
        # backbone_nn = torchvision.models.wide_resnet101_2(pretrained=True)

        modules = list(backbone_nn.children())[:-1]  # delete the last fc layer.
        backbone_nn = nn.Sequential(*modules)
        for param in backbone_nn.parameters():
            param.requires_grad = False

        # FasterRCNN needs to know the number of
        # output channels in a backbone. For resnet101, it's 2048
        backbone_nn.out_channels = 2048

Hello, I tried your solution but it doesn’t work.
Here is the link google collab where I used your solution: here

Thank you for your help.

looks like you are not done yet… when using a custom backbone neural network the tutorial states you need to also instantiate the pooling components and attach them to your model… I am not seeing that in your notebook

from the tutorial you need…

# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],

# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,

I got the same error following the tutorial exactly, but the FasterRCNN documentation:

suggests that the featmap_names in the MultiScaleROIAlign should be a character, ‘0’, not an integer. I replaced

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],


roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],

and the error on foward pass went away. I haven’t verified that the model finetunes appropriately though.

Hope this helps.

My work is here

I PRd a hint in the documentation regarding that exact problem… but I am still not getting good results

1 Like

Hey, this is the solution.
This should be notified in the documentation.

Thanks to all of you :slight_smile:

the PR is waiting here

1 Like