How could I design my own optimizer scheduler

Hi,

If I need to modify the scheme of my optimizer in an irregular way, how could I do it ?

For example, I have an adam optimizer, and I need it to keep working with its default parameters before the 1000th iteration, then I need to change beta1 to 0.3 and in the following training process, I need its learning rate to decay with the ratio of 0.9999. How could I do it with pytorch ?

You can custom a function to change the parameters of optimizer as the training goes.

def adjust_optim(optimizer, n_iter):
    if n_iter == 1000:
        optimizer.param_groups[0]['betas'] = (0.3, optimizer.param_groups[0]['betas'][1])
    if n_iter > 1000:  
        optimizer.param_groups[0]['lr'] *= 0.9999

If you are using multiple params groups, change 0 to i for ith group.

Best,

1 Like

Many thanks, BTW, how could I tell I am using param_groups or defaults. It appears that I can print out both optim.defaults and optim.paarm_groups, they are both exist in my optimizer.

It depends how you construct the optimizer.
If you do

optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)

that means you only have one param group.
If you do

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

that means you have two param groups.

Sorry, I edited the first post. It seems that changing .defaults won’t change the optimizer setup for param group 0 (when there is only one group).

Sorry, I should have searched enough before answering the question.

1 Like

Thanks a lot, I seem to get the idea now.

Check out the function exp_lr_scheduler in my fine tuning tutorial linked below. That lets you decay the LR. I just multiply it by a constant but you can do anything fancy you want using the same code structure.

Thanks for your helpful tutorial!! However, I have a question that may not be so relevant to this.
I noticed that caffe allows to set different learning rates for weight and bias tensor of a conv layer(usually lr for weight and 2*lr for bias). Could I do this with pytorch without too much tedious work to construct the param_groups for the optimizer ?