A problem about optimizer.param_groups in step function

I hope to rewrite a step function according to my own need.
In this function, I need to new some tensors which are the same size to all weights’ tensors(exclude biases’ tensors).
I hope to get the size from elements in param_groups.But I can’t realize this dict’s elements’ meaning.
Someone can explain this?or where can I learn the information about this?

When you initialize the optimizer using

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

or similar, pytorch creates one param_group. The learning rate is accessible via param_group['lr'] and the list of parameters is accessible via param_group['params']

If you want different learning rates for different parameters, you can initialise the optimizer like this.

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

This creates two parameter groups with different learning rates. That is the reason for having param_groups.

You might find reading the source for SGD to be useful. http://pytorch.org/docs/0.3.1/_modules/torch/optim/sgd.html#SGD

8 Likes

Thanks for your reply.I still have two problems about the source code:
for p in group[‘params’]:
if p.grad is None:
continue
d_p = p.grad.data
if weight_decay != 0:
d_p.add_(weight_decay, p.data)
if momentum != 0:
param_state = self.state[p]
problem1: Each p in param_groups is a data about weights’ or biases’ tensor data or grad or other parameters(kind of class Variable?)? So p1 is about conv1’s weight ,p2 is about conv1’s bias; p3 is about conv2’s weight; p4 is about conv2’s bias?
problem2:Is there only one group which include all parameters like weights,biases,weight decay and so on?``` If so, why there is a “s” in name param_group(s)? It’s funny.

Please use the code formatting tool in future.

  1. Each p is one of the parameter Variables of the model. p.grad is the Variable containing the gradients for that parameter.

  2. There will be several param_groups if you specify different learning rates for different parameters when you initialize the optimizer (as explained above).

Thank you very much!
I will learn how to use that tool, thank you for your reply and advice!

What if I want to use different optimizers for different param groups?

Do I have to define two optimizers or is there any other way?

1 Like