Hi there,
I’m training a Faster RCNN model with a Convnext backbone. That’s my model creation code:
backbone = torchvision.models.convnext_large(weights="DEFAULT").features
backbone.out_channels = 1536
anchor_generator = AnchorGenerator(
sizes=((8, 16, 24, 32),),
aspect_ratios=((0.5, 0.25, 1/6),)
)
roi_pooler = torchvision.ops.MultiScaleRoIAlign(
featmap_names=['0'],
output_size=7,
sampling_ratio=2
)
model = FasterRCNN(
backbone,
num_classes=num_classes,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler
)
I followed this tutorial:
TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.2.0+cu121 documentation
My problem is that during training it seems like the model begins the training with randomly initialized weights rather than pretrained weights. The mAP@50:95 begins from nearly zero. In contrast, if I train with a Resnet50 backbone with pretrained weights the mAP@50:95 increases much faster. I also tried training with Resnet50 with random weights and the increase in mAP@50:95 reminds me the training done with Convnext weights.
I should mention that I also tried inputting torchvision.models.convnext_large with pretrained=True, and it didn’t make a difference.
Does anyone have a guess about the origin of the problem? Training with Convnext in theory should reach higher mAP@50:95 but I get the opposite results.
David.