How to increase the learning rate without using cyclical learning rates?

Imahn · December 25, 2021, 9:51am

Hi,

I’d have a quick question: In the ResNet paper, in section 4.2, an architecture is described for the CIFAR10 dataset. When they describe using a very deep ResNet, they write:

So we use 0.01 to warm up the training until the training error is below
80% (about 400 iterations), and then go back to 0.1 and continue training.

Therefore, I would like to have the following learning rate schedule:

LR = 10^{-2} for the first two epochs,
LR = 10^{-1} for the rest of the training.

However, I haven’t found a way to achieve exactly this (cyclical learning rates would only gradually increase the learning rate, not at once). Any help would be appreciated. (-:

mMagmer · December 25, 2021, 11:25am

import torch
import torch.nn as nn
from torch.optim.lr_scheduler import ConstantLR , ReduceLROnPlateau, SequentialLR


model = nn.Sequential(nn.Linear(2,50),
                      nn.Sigmoid(),
                      nn.Linear(50,50),
                      nn.Sigmoid(),
                      nn.Linear(50,2)
                      )


criteria = nn.CrossEntropyLoss()
optim =  torch.optim.SGD(model.parameters(),lr=.1,momentum=.9)

def get_toyData(N1,N2):
    labels = torch.cat([torch.ones(N1),torch.zeros(N2)])
    data = torch.cat([torch.randn(N1,2),torch.randn(N2,2)+1.5])
    return data , labels 

N1 , N2 = 1000,1000 
train_data , train_labels = get_toyData(N1,N2)

dataset = torch.utils.data.TensorDataset(train_data,train_labels)
dl = torch.utils.data.DataLoader(dataset,batch_size=10,
                                 shuffle=True)

scheduler1 = ConstantLR(optim, factor=0.1, total_iters=40)
#scheduler2 = ExponentialLR(optim, gamma=0.9)
scheduler2 = ReduceLROnPlateau(optim, 'min')
    
for i in range(100):
    totalLoss = 0
    train_error = 0
    for x,t in dl:
        optim.zero_grad()
        out=model(x)
        loss = criteria(out,t.long())
        loss.backward()
        optim.step()
        totalLoss += loss.item()
    if i <40:
        
        scheduler1.step()
    else:
        scheduler2.step(totalLoss)
    print(i , optim.state_dict()['param_groups'][0]['lr'])
        
    #print(i,totalLoss,train_error/1300)

Assuming optimizer uses lr = 0.1 for all groups
lr = 0.01 if epoch < 40
lr = 0.1 if epoch =40
and ReduceLROnPlateau by .1 if epoch > 40

Imahn · December 25, 2021, 11:40am

First of all: Thanks! My apologies, but there’s one thing I forgot to mention in my original post, I’m afraid. Let me quote from the ResNet paper, section 4.2 again:

So we use 0.01 to warm up the training until the training error is below
80% (about 400 iterations), and then go back to 0.1 and continue training. The rest of the learning schedule is as done previously.

By “as done previously”, this is meant:

We start with a learning rate of 0.1, divide it by 10 at 32k and 48k iterations, and
terminate training at 64k iterations, […].

Basically, something like this should be our learning rate schedule:
lr = 0.01 if num_iters < 400,
lr = 0.1 if 400 <= num_iters < 32k,
lr = 0.01 if 32k <= num_iters < 48k,
lr = 0.001 if 48k <= num_iters < 60k.

I’d also be happy to use epochs instead of number of iterations, but I’m not sure how to achieve either. )-:

mMagmer · December 26, 2021, 11:01am

hi,

scheduler1 = ConstantLR(optim, factor=0.1, total_iters=400)
scheduler2 = MultiStepLR(optim, milestones=[32000,48000], gamma=0.1)

# in training inner loop
for j, x,t in enumerate(dl):
        optim.zero_grad()
        out=model(x)
        loss = criteria(out,t.long())
        loss.backward()
        optim.step()
        totalLoss += loss.item()
        if i*len(dl) + j <400:
            scheduler1.step()
            scheduler2.step()
        else:
            scheduler2.step()

I’m not sure that if statement is necessary anymore.
also, You can use LAMBDALR to do anything you want.

Imahn · December 27, 2021, 8:33pm

Hi! Actually, I think that the if-statement is indeed not needed:

scheduler1 = ConstantLR(optimizer, factor=0.01, total_iters=400)
scheduler2 = MultiStepLR(optimizer, milestones=[32000, 48000], gamma=0.1)
chained_scheduler = ChainedScheduler([scheduler1, scheduler2])

for (batch_idx, data, labels) in enumerate(train_loader): 
     ...
    chained_scheduler.step()

If I can ask a brief follow-up question: I had actually tried using the LambdaLR for this problem as well, but at the end, I didn’t really know how to implement it… I’d be happy if you showed me how! (-:

mMagmer · December 28, 2021, 6:43am

this should work.

def _lr_lambda(current_step):
        """
        _lr_lambda returns a multiplicative factor given an interger parameter epochs.
        """

        if current_step < 400:
            _lr =.1
        elif current_step < 32000:
            _lr = 1
        elif current_step < 48000:
            _lr = .1
        else:
            _lr = .01

        return _lr

scheduler = LambdaLR(optimizer, _lr_lambda, last_epoch)