Per Parameter Options in libtorch C++

melspec · October 9, 2020, 3:13pm

I have essentially the same problem as this post about per-parameter options in the optimizer. I am using the c++ api (libtorch) instead of python. Is there an example anywhere of how to set the learning rate per layer using libtorch? Thanks!

ptrblck · October 11, 2020, 10:02am

Based on this, this, and this code snippets you should be able to add new parameter groups via:

 auto& params_groups = optimizer.param_groups();
 params_groups.push_back(OptimizerParamGroup(params));

and set the learning rate via:

static_cast<torch::optim::AdamOptions&>(adam_optimizer.param_groups()[0].options()).lr()

for Adam (you might want to change it to the appropriate optimizer type).

melspec · October 12, 2020, 2:50pm

Thank you. This just sets the learning rate generically for all layers. I use this call already to change the learning rate for all layers from epoch to epoch. However, I need certain layers to have a different learning rate. I’m porting models from caffe and they seem to be very sensitive to this. In caffe, it was a per layer learning rate multiplier, mentioned in the post that I linked to. In python you can send in parameters for each layer, I cannot figure out how to create that same functionality in c++.

ptrblck · October 13, 2020, 5:09am

You should still be able to use the posted code to set the learning rate for each parameter group (which could be separate layers):

for (auto param_group : optimizer.param_groups()) {
  static_cast<torch::optim::AdamOptions &>(param_group.options()).lr(new_lr);
}
# or alternatively via the index in the first code snippet

melspec · October 13, 2020, 3:01pm

So I should push back a unique param_group for each layer of my model? The optimizer by default only has one param_group, not one per layer.

ptrblck · October 13, 2020, 11:51pm

Yes. This would be the same approach used in the Python API (docs for per-parameter options):

optimizer = torch.optim.SGD([
    {'params': [nn.Parameter(torch.zeros(1))]},
    {'params': [nn.Parameter(torch.ones(1))], 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

for param_group in optimizer.param_groups:
    print(param_group)

> {'params': [Parameter containing:
  tensor([0.], requires_grad=True)], 'lr': 0.01, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}
  {'params': [Parameter containing:
  tensor([1.], requires_grad=True)], 'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}

melspec · October 14, 2020, 8:55pm

Getting very close, I will post a complete solution once I get it. The new parameter groups created are going onto the CPU (the model was moved to the GPU prior to optimizer construction per the documentation) instead of the GPU. So when I run the model, it complains that the weights are on the CPU. How can I create the new parameters on the GPU instead?

As a side not for anyone interested I ended up using the call optimizer.add_param_group since it handles the creation of the options automatically. The code posted above with “params_groups.push_back” actually fails when you try to index into it with “Expected has_options() to be true, but got false”

scirocc19900305 · June 8, 2021, 3:18am

thanks for the share. But in my case is a little different.
I create a tensor not a module that need to track the grad,and wanna set different lr of that tensor.But I got problem when “params_groups.push_back(tensor)”,how can I fix it?

    torch::Tensor actionLogSTD;//that is the problem
    actionLogSTD = register_parameter("actionLogSTD", torch::zeros((1, actionDim)) - 0.5);//require_grad=true

    auto& params_groups = p_optimizer->param_groups();
    p_optimizer->parameters().push_back((*p_actor)->actionLogSTD);
    params_groups.push_back(torch::optim::OptimizerParamGroup((*p_sharedBILSTM)->parameters()));
    params_groups.push_back(torch::optim::OptimizerParamGroup((*p_actor)->actor_fc_net1->parameters()));
    params_groups.push_back(torch::optim::OptimizerParamGroup((*p_critic)->critic_fc_net1->parameters()));
    params_groups.push_back(torch::optim::OptimizerParamGroup((*p_critic)->actionLogSTD));//this lead to error,tensor type not matching OptimizerParamGroup type
    for (auto param_group : p_optimizer->param_groups())
        static_cast<torch::optim::AdamOptions &>(param_group.options()).lr(lr);

ptrblck · June 8, 2021, 4:00am

I don’t know what the error message is, but based on the docs I guess you might need to wrap the parameter into a std::vector<Tensor> first before passing it to the OptimizerParamGroup.

scirocc19900305 · June 8, 2021, 6:20am

thanks man!that really works!
I simply change
" params_groups.push_back(torch::optim::OptimizerParamGroup((*p_actor)->actionLogSTD));"
to
" params_groups.push_back(torch::optim::OptimizerParamGroup({(*p_actor)->actionLogSTD}));"

Abhishek_Shivakumar · August 10, 2024, 10:33pm

    // Create the first parameter group with drive_db_
    std::vector<torch::Tensor> params1 = { drive_db_ };

    // Create custom options for this parameter group
    auto options1 = std::make_unique<torch::optim::AdamOptions> (learning_rate);

    // Initialize the Adam optimizer with the parameter group
    std::vector<torch::optim::OptimizerParamGroup> param_groups;
    param_groups.emplace_back (torch::optim::OptimizerParamGroup ({drive_db_}, std::make_unique<torch::optim::AdamOptions> (learning_rate)));

    // Construct the optimizer without moving param_groups
    torch::optim::Adam optimizer (param_groups, torch::optim::AdamOptions (learning_rate)); // base learning rate

Unsure how I’d fix this… let’s see