I hope to rewrite a step function according to my own need.
In this function, I need to new some tensors which are the same size to all weights’ tensors(exclude biases’ tensors).
I hope to get the size from elements in param_groups.But I can’t realize this dict’s elements’ meaning.
Someone can explain this?or where can I learn the information about this?
When you initialize the optimizer using
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
or similar, pytorch creates one param_group. The learning rate is accessible via param_group['lr']
and the list of parameters is accessible via param_group['params']
If you want different learning rates for different parameters, you can initialise the optimizer like this.
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
This creates two parameter groups with different learning rates. That is the reason for having param_groups.
You might find reading the source for SGD to be useful. http://pytorch.org/docs/0.3.1/_modules/torch/optim/sgd.html#SGD
Thanks for your reply.I still have two problems about the source code:
for p in group[‘params’]:
if p.grad is None:
continue
d_p = p.grad.data
if weight_decay != 0:
d_p.add_(weight_decay, p.data)
if momentum != 0:
param_state = self.state[p]
problem1: Each p in param_groups is a data about weights’ or biases’ tensor data or grad or other parameters(kind of class Variable?)? So p1 is about conv1’s weight ,p2 is about conv1’s bias; p3 is about conv2’s weight; p4 is about conv2’s bias?
problem2:Is there only one group which include all parameters like weights,biases,weight decay and so on?``` If so, why there is a “s” in name param_group(s)? It’s funny.
Please use the code formatting tool in future.
-
Each p is one of the parameter Variables of the model. p.grad is the Variable containing the gradients for that parameter.
-
There will be several param_groups if you specify different learning rates for different parameters when you initialize the optimizer (as explained above).
Thank you very much!
I will learn how to use that tool, thank you for your reply and advice!
What if I want to use different optimizers for different param groups?
Do I have to define two optimizers or is there any other way?