Why step twice in object detection tutorial?

I noticed in the detection tutorial that we have our learning rate stepped in possibly two places… and I wanted to understand why please

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

    # construct an optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # and a learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)

    # let's train it for 10 epochs
    num_epochs = 10

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

and then within the detection reference engine we have…

def train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
    header = 'Epoch: [{}]'.format(epoch)

    lr_scheduler = None
    if epoch == 0:
        warmup_factor = 1. / 1000
        warmup_iters = min(1000, len(data_loader) - 1)

        lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)

    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
   
...

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        if lr_scheduler is not None:
            lr_scheduler.step()

...

Any ideas why?

In the classification engine there is no such LR stepping nor warmup being performed

The schedulers are not identical, i.e. a StepLR will be used after each epoch and a warmup_lr_scheduler, which can be found here, will be used inside the training loop.

1 Like

care to give a bit of background on why this is?

Is it that the training and scoring/evaluating both use StepLR. What does a warm_lr_scheduler do? Had not see this anywhere before

The warmup technique is used during the first phase of the training. It consists in small learning rate and a scheduler that increases it everytime until a certain number of steps (warmup steps). So you need 2 different scheduler, as you can see from the code you posted, the warmup scheduler seems to be used only during the first epoch.

reference: https://datascience.stackexchange.com/questions/55991/in-the-context-of-deep-learning-what-is-training-warmup-steps

1 Like