Mask R-CNN optimizer and learning rate scheduler

JamesDickens · December 1, 2020, 4:28am

In the Mask R-CNN paper the optimizer is described as follows training on MS COCO 2014/2015 dataset for instance segmentation (I believe this is the dataset, correct me if this is wrong)

We train on 8 GPUs (so effective minibatch
size is 16) for 160k iterations, with a learning rate of
0.02 which is decreased by 10 at the 120k iteration. We
use a weight decay of 0.0001 and momentum of 0.9. With
ResNeXt [45], we train with 1 image per GPU and the same
number of iterations, with a starting learning rate of 0.01.

I’m trying to write an optimizer and learning rate scheduler in Pytorch for a similar application, to match this description.

For the optimizer I have:

def get_Mask_RCNN_Optimizer(model, learning_rate=0.02):
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=0.0001)
    return optimizer

For the learning rate scheduler I have:

def get_MASK_RCNN_LR_Scheduler(optimizer, step_size):
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gammma=0.1, verbose=True)
    return scheduler

When the authors say “decreased by 10” do they mean divide by 10? Or do they literally mean subtract by 10, in which case we have a negative learning rate, which seems odd/wrong. Any insights appreciated.