How to do exponential learning rate decay in PyTorch?

Hi! I want to transform the codes below implemented with TensorFlow into a PyTorch version:

lr = tf.train.exponential_decay(start_lr, global_step, 3000, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate=lr, epsilon=0.1)

But I don’t know what’s the counterpart of PyTorch of exponential learning rate decay. Anyone who can tell me? Thanks a lot!

4 Likes

This should be interesting for you!

Ah it’s interesting how you make the learning rate scheduler first in TensorFlow, then pass it into your optimizer.

In PyTorch, we first make the optimizer:

my_model = torchvision.models.resnet50()

my_optim = torch.optim.Adam(params=my_model.params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

Note that optimizers in PyTorch typically take the parameters of your model as input, so an example model is defined above. The arguments I passed to Adam are the default arguments, you can definitely change the lr to whatever your starting learning rate will be.

After making the optimizer, you want to wrap it inside a lr_scheduler:

decayRate = 0.96
my_lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=my_optim, gamma=decayRate)

Then train as usual in PyTorch:

for e in epochs:
    train_epoch()
    valid_epoch()

    my_lr_scheduler.step()

Note that the my_lr_scheduler.step() call is what will decay your learning rate every epoch. train_epoch() and valid_epoch() are passing over your training data and test/valid data. Be sure to still step with your optimizer for every batch in your training data! In other words, you still have to use the my_optim.zero_grad(), loss.backward(), and my_optim.step() calls. Just don’t get the steps confused for your actual optimizer and your lr_scheduler, you still need them both.

Here’s a good example:
TorchVision Object Detection Finetuning Tutorial

17 Likes

Like your explanation, that works for me! :grinning: Thanks.

1 Like

Hello Audrey,

Even i want to implement the exponential decay and also need learning rate to be printed for each epcoh. How did you implement it?

Sample code is here @ku294714 ,

optimizer = optim.SGD(net.parameters(), lr=0.1)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

for i in range(15):
    lr = scheduler.get_last_lr()[0] # latest pytorch 1.5+ uses get_last_lr,  previously it was get_lr iirc;
    lr1 = optimizer.param_groups[0]["lr"] # either the above line or this, both should do the same thing
    print(i, lr, lr1)
    scheduler.step()
1 Like

Sample code for ExponentialLR, since it do not have step inbuilt in the scheduler.

args.learning_rate = 0.001 i.e. start_lr
args.learning_rate_decay_factor = 0.96
args.learning_rate_decay_step = 3000

optim = torch.optim.Adam(params=model.parameters(), lr=args.learning_rate)

lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optim, gamma=args.learning_rate_decay_factor,last_epoch=-1)

for epoch in (range(1, args.num_epoch + 1)):
   
    # Do Forward pass here
    
    optim.zero_grad()
    loss.backward()
    optim.step()
  
    if epoch % args.learning_rate_decay_step == 0:
        lr_scheduler.step()
    # debugging purpose
    print(lr_scheduler.get_last_lr()) # will print last learning rate.

The training process will update learning rate every 3000 epoch.