I have essentially the same problem as this post about per-parameter options in the optimizer. I am using the c++ api (libtorch) instead of python. Is there an example anywhere of how to set the learning rate per layer using libtorch? Thanks!
Based on this, this, and this code snippets you should be able to add new parameter groups via:
auto& params_groups = optimizer.param_groups();
params_groups.push_back(OptimizerParamGroup(params));
and set the learning rate via:
static_cast<torch::optim::AdamOptions&>(adam_optimizer.param_groups()[0].options()).lr()
for Adam
(you might want to change it to the appropriate optimizer type).
Thank you. This just sets the learning rate generically for all layers. I use this call already to change the learning rate for all layers from epoch to epoch. However, I need certain layers to have a different learning rate. I’m porting models from caffe and they seem to be very sensitive to this. In caffe, it was a per layer learning rate multiplier, mentioned in the post that I linked to. In python you can send in parameters for each layer, I cannot figure out how to create that same functionality in c++.
You should still be able to use the posted code to set the learning rate for each parameter group (which could be separate layers):
for (auto param_group : optimizer.param_groups()) {
static_cast<torch::optim::AdamOptions &>(param_group.options()).lr(new_lr);
}
# or alternatively via the index in the first code snippet
So I should push back a unique param_group for each layer of my model? The optimizer by default only has one param_group, not one per layer.
Yes. This would be the same approach used in the Python API (docs for per-parameter options):
optimizer = torch.optim.SGD([
{'params': [nn.Parameter(torch.zeros(1))]},
{'params': [nn.Parameter(torch.ones(1))], 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
for param_group in optimizer.param_groups:
print(param_group)
> {'params': [Parameter containing:
tensor([0.], requires_grad=True)], 'lr': 0.01, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}
{'params': [Parameter containing:
tensor([1.], requires_grad=True)], 'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}
Getting very close, I will post a complete solution once I get it. The new parameter groups created are going onto the CPU (the model was moved to the GPU prior to optimizer construction per the documentation) instead of the GPU. So when I run the model, it complains that the weights are on the CPU. How can I create the new parameters on the GPU instead?
As a side not for anyone interested I ended up using the call optimizer.add_param_group since it handles the creation of the options automatically. The code posted above with “params_groups.push_back” actually fails when you try to index into it with “Expected has_options() to be true, but got false”
thanks for the share. But in my case is a little different.
I create a tensor not a module that need to track the grad,and wanna set different lr of that tensor.But I got problem when “params_groups.push_back(tensor)”,how can I fix it?
torch::Tensor actionLogSTD;//that is the problem
actionLogSTD = register_parameter("actionLogSTD", torch::zeros((1, actionDim)) - 0.5);//require_grad=true
auto& params_groups = p_optimizer->param_groups();
p_optimizer->parameters().push_back((*p_actor)->actionLogSTD);
params_groups.push_back(torch::optim::OptimizerParamGroup((*p_sharedBILSTM)->parameters()));
params_groups.push_back(torch::optim::OptimizerParamGroup((*p_actor)->actor_fc_net1->parameters()));
params_groups.push_back(torch::optim::OptimizerParamGroup((*p_critic)->critic_fc_net1->parameters()));
params_groups.push_back(torch::optim::OptimizerParamGroup((*p_critic)->actionLogSTD));//this lead to error,tensor type not matching OptimizerParamGroup type
for (auto param_group : p_optimizer->param_groups())
static_cast<torch::optim::AdamOptions &>(param_group.options()).lr(lr);
I don’t know what the error message is, but based on the docs I guess you might need to wrap the parameter into a std::vector<Tensor>
first before passing it to the OptimizerParamGroup
.
thanks man!that really works!
I simply change
" params_groups.push_back(torch::optim::OptimizerParamGroup((*p_actor)->actionLogSTD));"
to
" params_groups.push_back(torch::optim::OptimizerParamGroup({(*p_actor)->actionLogSTD}));"
// Create the first parameter group with drive_db_
std::vector<torch::Tensor> params1 = { drive_db_ };
// Create custom options for this parameter group
auto options1 = std::make_unique<torch::optim::AdamOptions> (learning_rate);
// Initialize the Adam optimizer with the parameter group
std::vector<torch::optim::OptimizerParamGroup> param_groups;
param_groups.emplace_back (torch::optim::OptimizerParamGroup ({drive_db_}, std::make_unique<torch::optim::AdamOptions> (learning_rate)));
// Construct the optimizer without moving param_groups
torch::optim::Adam optimizer (param_groups, torch::optim::AdamOptions (learning_rate)); // base learning rate
Unsure how I’d fix this… let’s see