How to set different learning rate for weight and bias in one layer?

elysion · February 8, 2018, 3:22pm

In Caffe, we can set different learning rate for weight and bias in one layer.
For example:

layer {
    name: "conv2"
    type: "Convolution"
    bottom: "bn_conv2"
    top: "conv2"
    param {
       lr_mult: 1.000000*
    }
   param {
        lr_mult: 0.100000
    }
    convolution_param {
        num_output: 64
        kernel_size: 3
        stride: 1
        pad: 1
        weight_filler {
            type: "msra"
        }
        bias_filler {
            type: "constant"
            value: 0
        }
    }
}

the learning rate of weight and bias is leaning rate*lr_mult.

In pytorch, is it possible to set different learning rate for weight and bias in one layer?
How to write the program？

jpeg729 · February 8, 2018, 3:58pm

This might help http://pytorch.org/docs/0.3.0/optim.html#per-parameter-options

elysion · February 9, 2018, 7:18am

Thank you! I read the doc file. The Example seems to set different learning rate for different layers. The doc said we can use dict or param_group to set learning rate for different layers.
I’m new in pytorch. May be there is a way to set weight/bias wise learning rate, but I can’t find it.
would you please tell me more about this？Thank you.

jpeg729 · February 9, 2018, 8:04am

The example shows how to set different parameters for layer.parameters() you just need to dig a little deeper into the details.

E.g. for a Linear layer, the weight and bias parameters are named mylayer.weight and mylayer.bias.

optim.SGD([
                {'params': mylayer.weight},
                {'params': mylayer.bias, 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

elysion · February 9, 2018, 9:20am

Thank you so much for your patient guidance ! I tried this code. It reports error like this:

Traceback (most recent call last):
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1668, in <module>
    main()
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1662, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/mitc/pycharm-2017.3.3/helpers/pydev/pydevd.py", line 1072, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/mitc/lcy/Pytorch-SR/nd_vdsr_unfold.py", line 137, in <module>
    ], lr=0.1, weight_decay=0.0001)
  File "/home/mitc/anaconda2/envs/lcy-pytorch/lib/python2.7/site-packages/torch/optim/adam.py", line 28, in __init__
    super(Adam, self).__init__(params, defaults)
  File "/home/mitc/anaconda2/envs/lcy-pytorch/lib/python2.7/site-packages/torch/optim/optimizer.py", line 61, in __init__
    raise ValueError("can't optimize a non-leaf Variable")
ValueError: can't optimize a non-leaf Variable

my code is：

class Net(nn.Module):
    def __init__(self):#1,3,11,13,1
        super(Net, self).__init__()
        self.layer11 = nn.Sequential(
            nn.BatchNorm3d(num_features=1,momentum=0.999,affine=False),
            nn.ReLU(inplace=True),
            nn.Conv3d(in_channels=1,out_channels=16,kernel_size=(3,3,3),padding=(1,1,1),bias=True))
        self.layer21 = nn.Sequential(
            nn.BatchNorm3d(num_features=16, momentum=0.999, affine=False),
            nn.ReLU(inplace=True),
            nn.Conv3d(in_channels=16, out_channels=16, kernel_size=(3, 3, 3), padding=(1, 1, 1), bias=True))
....

 def forward(self, x, residual):
        #residual = x1
        out = self.layer11(x)
        out = self.layer21(out)
        out = self.layer22(out)
       ....
       out = torch.add(out, residual)
        return out 

if __name__=="__main__":
    net = Net()
    optimizer = optim.Adam([
                {'params': net.layer11[2].weight},
                {'params': net.layer11[2].bias, 'lr': 0.01}
            ], lr=0.1, weight_decay=0.0001)
.....

jpeg729 · February 9, 2018, 9:52am

This toy example works.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer = nn.Linear(1, 1)
        self.layer.weight.data.fill_(1)
        self.layer.bias.data.fill_(1)

    def forward(self, x):
        return self.layer(x)

if __name__=="__main__":
    net = Net()
    optimizer = optim.Adam([
                {'params': net.layer.weight},
                {'params': net.layer.bias, 'lr': 0.01}
            ], lr=0.1, weight_decay=0.0001)
    out = net(Variable(torch.Tensor([[1]])))
    out.backward()
    optimizer.step()
    print("weight", net.layer.weight.data.numpy(), "grad", net.layer.weight.grad.data.numpy())
    print("bias", net.layer.bias.data.numpy(), "grad", net.layer.bias.grad.data.numpy())

Output is

weight [[ 0.90000004]] grad [[ 1.]]
bias [ 0.99000001] grad [ 1.]

As you can see, weight has been updated by ~0.1 * weight.grad and bias has been updated using ~0.01 * bias.grad.

The error you get suggests that you have asked the optimiser to optimise a Variable that isn’t a parameter of your model. But your partial code sample seems fine.

elysion · February 9, 2018, 12:57pm

Thank you so much!
By running your code , I find there are bugs in pytorch version 0.1.12.
I change pytorch version. it worked.

zxd123456 · March 18, 2018, 11:38am

Hello. Your solution is correct.
But I met the problem because the model has to much embeddings like “module.mixed_stem2.branch0.0.bn.weight”
So I always get wrong when I use your method because the syntax of “batch0.0”
What should I do to iter the weight and bias of this model so I can set different learning rate for them?

elysion · March 18, 2018, 2:08pm

Try “batch0[0]” instead of “batch0.0”.

github.com

wkentaro/pytorch-fcn/blob/master/examples/voc/train_fcn32s.py#L105


cfg = configurations[args.config]
out = get_log_dir('fcn32s', args.config, cfg)
resume = args.resume


os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu)
cuda = torch.cuda.is_available()


torch.manual_seed(1337)
if cuda:
    torch.cuda.manual_seed(1337)


# 1. dataset


root = osp.expanduser('~/data/datasets')
kwargs = {'num_workers': 4, 'pin_memory': True} if cuda else {}
train_loader = torch.utils.data.DataLoader(
    torchfcn.datasets.SBDClassSeg(root, split='train', transform=True),
    batch_size=1, shuffle=True, **kwargs)
val_loader = torch.utils.data.DataLoader(
    torchfcn.datasets.VOC2011ClassSeg(
        root, split='seg11valid', transform=True),

gives a solution for weigfht and bias wise learning rate setting.
Just use function get_parameters()

def get_parameters(model, bias=False):
    import torch.nn as nn
    modules_skipped = (
        nn.ReLU,
        nn.MaxPool2d,
        nn.Dropout2d,
        nn.Sequential,
        torchfcn.models.FCN32s,
        torchfcn.models.FCN16s,
        torchfcn.models.FCN8s,
    )
    for m in model.modules():
        if isinstance(m, nn.Conv2d):
            if bias:
                yield m.bias
            else:
                yield m.weight
        elif isinstance(m, nn.ConvTranspose2d):
            # weight is frozen because it is just a bilinear upsampling
            if bias:
                assert m.bias is None
        elif isinstance(m, modules_skipped):
            continue
        else:
            raise ValueError('Unexpected module: %s' % str(m))

zxd123456 · March 18, 2018, 2:43pm

waooooo~~~~
Thank you very much. Your code is really simple and can solve my problem well.
I tried all night to construct a big for iteration to implement this function. But your method is so amazing. Thank you!

abe7 · November 11, 2020, 12:29pm

Can you just multiply the gradients for specific layers after loss.backward() and before optimizer.step() by a constant? Would that have the same effect?