I’m having a little trouble trying to train a Faster-RCNN model on COCO, with an ImageNet-pretrained torchvision
ConvNeXt as the backbone, as shown below:
import torch
import torchvision.models.detection as torchdet
from torchvision.models import convnext_tiny, ConvNeXt_Tiny_Weights
backbone = convnext_tiny(weights=ConvNeXt_Tiny_Weights.DEFAULT).features
# 768 determined using torchinfo.summary(backbone, (3,300,300))
backbone.out_channels = 768
# 5x3 per location
anchor_generator = torchdet.rpn.AnchorGenerator(
sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'],
output_size=7,
sampling_ratio=2)
# 91 classes in MS COCO
model = torchdet.FasterRCNN(backbone=backbone, num_classes=91,
rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
I’m trying to emulate the training recipes used by the Torchvision team, so my setup looks like this:
params = [p for p in model.parameters() if p.requires_grad]
# 0.0025 LR used because only using 1 GPU, Facebook used 0.02 for 8 GPUs
optimizer = torch.optim.SGD(params, lr=0.0025, momentum=0.9, weight_decay=1e-4)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[16, 22], gamma=0.1)
#Using batchsize 2
However, the loss will only drop to about 0.5 before jumping back up to 1.8, across iterations in just one epoch. Is there something else I should change about the training to better match what the Torchvision team did for training its other FasterRCNN models?
I’m also curious what the featmap_names
in torchvision.ops.MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'],...)
is supposed to do?
Thanks!