I am trying to a FasterRCNN based object detection framework with a custom backbone. The backbone is the encoder architecture of Mix Vision Transformer architecture implemented in the Segmentation Models Pytorch library(GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.). The encoder returns 4 features maps of sizes
torch.Size([1, 64, 320, 180])
torch.Size([1, 128, 160, 90])
torch.Size([1, 320, 80, 45])
torch.Size([1, 512, 40, 23])
I passed these features through from torchvision.ops.feature_pyramid_network import FeaturePyramidNetwork
and transformed them into feature maps of sizes
torch.Size([1, 256, 320, 180])
torch.Size([1, 256, 160, 90])
torch.Size([1, 256, 80, 45])
torch.Size([1, 256, 40, 23])
I set the backbone.out_channels parameter as 256, and then used the following code to construct the FasterRCNN model
backbone_fpn = BackboneWithFPN(backbone, [64,128,320,512], 256)
backbone_fpn.out_channels = 256
anchor_generator = AnchorGenerator(sizes=((16, 32, 64, 128),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0','1','2','3'],
output_size=7,
sampling_ratio=2)
model = FasterRCNN(
backbone=backbone_fpn,
num_classes = 5,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler
)
The model object was constructed however, whenever I invoke a feed forward operation I get the following issue.
Traceback (most recent call last):
File "/data_fast/venkatesh/carla/custom_fasterRCNN.py", line 67, in <module>
pred = model(list(image))
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/generalized_rcnn.py", line 104, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/rpn.py", line 361, in forward
anchors = self.anchor_generator(images, features)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/anchor_utils.py", line 127, in forward
anchors_over_all_feature_maps = self.grid_anchors(grid_sizes, strides)
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torchvision/models/detection/anchor_utils.py", line 88, in grid_anchors
torch._assert(
File "/home/venkatesh/anaconda3/envs/cscapes/lib/python3.10/site-packages/torch/__init__.py", line 827, in _assert
assert condition, message
AssertionError: Anchors should be Tuple[Tuple[int]] because each feature map could potentially have different sizes and aspect ratios. There needs to be a match between the number of feature maps passed and the number of sizes / aspect ratios specified.
Tried to alter the sizes
and the aspect_ratio
parameters of AnchorGenerator
object, nothing worked. Could someone please help me with this issue?