FasterRCNN with custom backbone architecture - Error in Anchor Generator object

I am trying to a FasterRCNN based object detection framework with a custom backbone. The backbone is the encoder architecture of Mix Vision Transformer architecture implemented in the Segmentation Models Pytorch library(GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.). The encoder returns 4 features maps of sizes

torch.Size([1, 64, 320, 180])
torch.Size([1, 128, 160, 90])
torch.Size([1, 320, 80, 45])
torch.Size([1, 512, 40, 23])

I passed these features through from torchvision.ops.feature_pyramid_network import FeaturePyramidNetwork and transformed them into feature maps of sizes

torch.Size([1, 256, 320, 180])
torch.Size([1, 256, 160, 90])
torch.Size([1, 256, 80, 45])
torch.Size([1, 256, 40, 23])

I set the backbone.out_channels parameter as 256, and then used the following code to construct the FasterRCNN model

backbone_fpn = BackboneWithFPN(backbone, [64,128,320,512], 256)
backbone_fpn.out_channels = 256

anchor_generator = AnchorGenerator(sizes=((16, 32, 64, 128),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0','1','2','3'],
model = FasterRCNN(
    num_classes = 5,

The model object was constructed however, whenever I invoke a feed forward operation I get the following issue.

Traceback (most recent call last):
  File "/data_fast/venkatesh/carla/", line 67, in <module>
    pred = model(list(image))
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 104, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 361, in forward
    anchors = self.anchor_generator(images, features)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 127, in forward
    anchors_over_all_feature_maps = self.grid_anchors(grid_sizes, strides)
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/", line 88, in grid_anchors
  File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/", line 827, in _assert
    assert condition, message
AssertionError: Anchors should be Tuple[Tuple[int]] because each feature map could potentially have different sizes and aspect ratios. There needs to be a match between the number of feature maps passed and the number of sizes / aspect ratios specified.

Tried to alter the sizes and the aspect_ratio parameters of AnchorGenerator object, nothing worked. Could someone please help me with this issue?