I’m having a little trouble trying to train a Faster-RCNN model on COCO, with an ImageNet-pretrained
torchvision ConvNeXt as the backbone, as shown below:
import torch import torchvision.models.detection as torchdet from torchvision.models import convnext_tiny, ConvNeXt_Tiny_Weights backbone = convnext_tiny(weights=ConvNeXt_Tiny_Weights.DEFAULT).features # 768 determined using torchinfo.summary(backbone, (3,300,300)) backbone.out_channels = 768 # 5x3 per location anchor_generator = torchdet.rpn.AnchorGenerator( sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),)) roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=7, sampling_ratio=2) # 91 classes in MS COCO model = torchdet.FasterRCNN(backbone=backbone, num_classes=91, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
I’m trying to emulate the training recipes used by the Torchvision team, so my setup looks like this:
params = [p for p in model.parameters() if p.requires_grad] # 0.0025 LR used because only using 1 GPU, Facebook used 0.02 for 8 GPUs optimizer = torch.optim.SGD(params, lr=0.0025, momentum=0.9, weight_decay=1e-4) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[16, 22], gamma=0.1) #Using batchsize 2
However, the loss will only drop to about 0.5 before jumping back up to 1.8, across iterations in just one epoch. Is there something else I should change about the training to better match what the Torchvision team did for training its other FasterRCNN models?
I’m also curious what the
torchvision.ops.MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'],...) is supposed to do?