How to set different learning rate for weight and bias in one layer?

In Caffe, we can set different learning rate for weight and bias in one layer.
For example:

layer {
    name: "conv2"
    type: "Convolution"
    bottom: "bn_conv2"
    top: "conv2"
    param {
       lr_mult: 1.000000*
   param {
        lr_mult: 0.100000
    convolution_param {
        num_output: 64
        kernel_size: 3
        stride: 1
        pad: 1
        weight_filler {
            type: "msra"
        bias_filler {
            type: "constant"
            value: 0

the learning rate of weight and bias is leaning rate*lr_mult.

In pytorch, is it possible to set different learning rate for weight and bias in one layer?
How to write the program?


This might help

1 Like

Thank you! I read the doc file. The Example seems to set different learning rate for different layers. The doc said we can use dict or param_group to set learning rate for different layers.
I’m new in pytorch. May be there is a way to set weight/bias wise learning rate, but I can’t find it.
would you please tell me more about this?Thank you.

The example shows how to set different parameters for layer.parameters() you just need to dig a little deeper into the details.

E.g. for a Linear layer, the weight and bias parameters are named mylayer.weight and mylayer.bias.

                {'params': mylayer.weight},
                {'params': mylayer.bias, 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

Thank you so much for your patient guidance ! I tried this code. It reports error like this:

Traceback (most recent call last):
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/", line 1668, in <module>
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/", line 1662, in main
    globals =['file'], None, None, is_module)
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/", line 1072, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/mitc/lcy/Pytorch-SR/", line 137, in <module>
    ], lr=0.1, weight_decay=0.0001)
  File "/home/mitc/anaconda2/envs/lcy-pytorch/lib/python2.7/site-packages/torch/optim/", line 28, in __init__
    super(Adam, self).__init__(params, defaults)
  File "/home/mitc/anaconda2/envs/lcy-pytorch/lib/python2.7/site-packages/torch/optim/", line 61, in __init__
    raise ValueError("can't optimize a non-leaf Variable")
ValueError: can't optimize a non-leaf Variable

my code is:

class Net(nn.Module):
    def __init__(self):#1,3,11,13,1
        super(Net, self).__init__()
        self.layer11 = nn.Sequential(
        self.layer21 = nn.Sequential(
            nn.BatchNorm3d(num_features=16, momentum=0.999, affine=False),
            nn.Conv3d(in_channels=16, out_channels=16, kernel_size=(3, 3, 3), padding=(1, 1, 1), bias=True))

 def forward(self, x, residual):
        #residual = x1
        out = self.layer11(x)
        out = self.layer21(out)
        out = self.layer22(out)
       out = torch.add(out, residual)
        return out 

if __name__=="__main__":
    net = Net()
    optimizer = optim.Adam([
                {'params': net.layer11[2].weight},
                {'params': net.layer11[2].bias, 'lr': 0.01}
            ], lr=0.1, weight_decay=0.0001)

This toy example works.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer = nn.Linear(1, 1)

    def forward(self, x):
        return self.layer(x)

if __name__=="__main__":
    net = Net()
    optimizer = optim.Adam([
                {'params': net.layer.weight},
                {'params': net.layer.bias, 'lr': 0.01}
            ], lr=0.1, weight_decay=0.0001)
    out = net(Variable(torch.Tensor([[1]])))
    print("weight",, "grad",
    print("bias",, "grad",

Output is

weight [[ 0.90000004]] grad [[ 1.]]
bias [ 0.99000001] grad [ 1.]

As you can see, weight has been updated by ~0.1 * weight.grad and bias has been updated using ~0.01 * bias.grad.

The error you get suggests that you have asked the optimiser to optimise a Variable that isn’t a parameter of your model. But your partial code sample seems fine.


Thank you so much!
By running your code , I find there are bugs in pytorch version 0.1.12.
I change pytorch version. it worked.

Hello. Your solution is correct.
But I met the problem because the model has to much embeddings like “”
So I always get wrong when I use your method because the syntax of “batch0.0”
What should I do to iter the weight and bias of this model so I can set different learning rate for them?

Try “batch0[0]” instead of “batch0.0”.

gives a solution for weigfht and bias wise learning rate setting.
Just use function get_parameters()

def get_parameters(model, bias=False):
    import torch.nn as nn
    modules_skipped = (
    for m in model.modules():
        if isinstance(m, nn.Conv2d):
            if bias:
                yield m.bias
                yield m.weight
        elif isinstance(m, nn.ConvTranspose2d):
            # weight is frozen because it is just a bilinear upsampling
            if bias:
                assert m.bias is None
        elif isinstance(m, modules_skipped):
            raise ValueError('Unexpected module: %s' % str(m))

Thank you very much. Your code is really simple and can solve my problem well.
I tried all night to construct a big for iteration to implement this function. But your method is so amazing. Thank you!

Can you just multiply the gradients for specific layers after loss.backward() and before optimizer.step() by a constant? Would that have the same effect?