How can I specify the number of epochs after which the learning rate is decayed in Exponetial scheduler?

Shisho_Sama · October 6, 2018, 4:56am

Hi, is it possible like in tensorflow to specify after how many epochs the learning rate gets decayed?
I looked into the documentation and noticed the current implementation only decays the learning rate after each epoch! and there is no way to specify anything else there!
I thought maybe I can use a condition in training section and write sth like :

if (epoch %2 == 0):
   scheduler.step()

but I noticed since the implementation always uses the last epoch , but i’m not sure if it would yeild a different result ?

More information :
Actually the reason I’m asking this is I want to implement the same setting Tensorflow uses to train MobileNet. Here is thetensorflow’s setting :

--model_name="mobilenet_v2"
--learning_rate=0.045 * NUM_GPUS   #slim internally averages clones so we compensate
--preprocessing_name="inception_v2"
--label_smoothing=0.1
--moving_average_decay=0.9999
--batch_size= 96
--num_clones = NUM_GPUS # you can use any number here between 1 and 8 depending on your hardware setup.
--learning_rate_decay_factor=0.98
--num_epochs_per_decay = 2.5 / NUM_GPUS # train_image_classifier does per clone epochs

And this is the training script, As you can see different hyperparameters are given. what I’m specifically asking about is the Exponential learning rate, which its implementation is given here :

  decayed_learning_rate = learning_rate *
                          decay_rate ^ (global_step / decay_steps)

based on this , global step is basically a counter for number of batches read so far(i.e it holds the total number of steps during training).

So basically tensorflow uses the iterations and Pytorch uses epoch. my first question is, would it not make the outcome different?

I tried to do the exp scheduler using iterations like this :

def exp_lr_scheduler(optimizer, iter, lr_decay_iter=6400,
                      max_iter=2400000, gamma=0.96):
    """Exponential decay of learning rate
        :param iter is a current iteration
        :param lr_decay_iter how frequently decay occurs, default is 6400 (batch of 64)
        :param max_iter is number of maximum iterations
        :gamma is the ratio by which the decay happens

    """
    if iter > max_iter:
        return optimizer
    #learning rate 
    lr=0
    # fetch the last lr (in the beginning it would be the initial learning rate)    
    for param_group in optimizer.param_groups:
         lr = param_group['lr']

    lr = lr * gamma ** (iter / lr_decay_iter)
    for param_group in optimizer.param_groups:
         param_group['lr'] = lr

    return lr

However,Since I do not have a global iteration, can I simply multiply epoch by the current iteration and use that?
meaning :

def exp_lr_scheduler(optimizer, iter, lr_decay_iter=6400,
                      max_iter=2400000, gamma=0.96):
    """Exponential decay of learning rate
        :param iter is a current iteration
        :param lr_decay_iter how frequently decay occurs, default is 6400 (batch of 64)
        :param max_iter is number of maximum iterations
        :gamma is the ratio by which the decay happens

    """
    if iter > max_iter:
        return optimizer
    #learning rate 
    lr=0
    # fetch the last lr (in the beginning it would be the initial learning rate)    
    for param_group in optimizer.param_groups:
         lr = param_group['lr']

    lr = lr * gamma ** ((epoch * iter) / lr_decay_iter)
    for param_group in optimizer.param_groups:
         param_group['lr'] = lr

    return lr

What should I do here. which way is the correct one?

can anybody please help me on this ?