FasterRCNN with custom backbone architecture - Error in Anchor Generator object

I am trying to a FasterRCNN based object detection framework with a custom backbone. The backbone is the encoder architecture of Mix Vision Transformer architecture implemented in the Segmentation Models Pytorch library(GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.). The encoder returns 4 features maps of sizes

torch.Size([1, 64, 320, 180])
torch.Size([1, 128, 160, 90])
torch.Size([1, 320, 80, 45])
torch.Size([1, 512, 40, 23])

I passed these features through from torchvision.ops.feature_pyramid_network import FeaturePyramidNetwork and transformed them into feature maps of sizes

torch.Size([1, 256, 320, 180])
torch.Size([1, 256, 160, 90])
torch.Size([1, 256, 80, 45])
torch.Size([1, 256, 40, 23])

I set the backbone.out_channels parameter as 256, and then used the following code to construct the FasterRCNN model

backbone_fpn = BackboneWithFPN(backbone, [64,128,320,512], 256)
backbone_fpn.out_channels = 256

anchor_generator = AnchorGenerator(sizes=((16, 32, 64, 128),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0','1','2','3'],
model = FasterRCNN(
    num_classes = 5,

The model object was constructed however, whenever I invoke a feed forward operation I get the following issue.

Traceback (most recent call last):
  File "/data_fast/venkatesh/carla/", line 67, in <module>
    pred = model(list(image))
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 104, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 361, in forward
    anchors = self.anchor_generator(images, features)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 127, in forward
    anchors_over_all_feature_maps = self.grid_anchors(grid_sizes, strides)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 88, in grid_anchors
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/", line 827, in _assert
    assert condition, message
AssertionError: Anchors should be Tuple[Tuple[int]] because each feature map could potentially have different sizes and aspect ratios. There needs to be a match between the number of feature maps passed and the number of sizes / aspect ratios specified.

Tried to alter the sizes and the aspect_ratio parameters of AnchorGenerator object, nothing worked. Could someone please help me with this issue?

Did you get this working?
Potentially changing AnchorGenerator sizes to ((16,), (32,), (64,), (128,),)
will fix the issue?

It worked after I made that change! Thank you!

