Give different lr for different parameters in the same layer

barakb · March 4, 2020, 7:14am

Hi, let’s say I have a convolution layer with weights of size: [64,64,3,3]. now I want to give different lrs , acctualy I want to freeze part of the kernel (for pure research), I first tried:

model.layer_name.weight[1:,:,:,:].requires_grad  = False

This returned:

RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

Furthermore, I tried to think how to use the normal format I’m using to define different lr for different layers:

    parameters= []
    
    ft_module_names = ['layername']
    for k, v in model.named_parameters():
        for ft_module in ft_module_names:
            if ft_module == k:
               parameters.append({'params': v, 'lr':args.lr_new})
               break
        else:
             parameters.append({'params': v})
    
    optimizer = torch.optim.SGD(parameters, args.lr,
                                momentum=args.sgd_momentum,
                                weight_decay=args.weight_decay)

But I wasn’t able to think how to modify it so that it will be compatible for different lrs in the same layers

I’ll be happy if you have any ideas!
Thanks!

barakb · March 13, 2020, 7:22am

ping, any idea someone?

ptrblck · March 13, 2020, 9:10pm

As the error suggests, you won’t be able to change the learning rate of a specific part of the parameter.
You could register a hook to this parameter (param.register_hook) and manipulate the gradients by scaling them with your learning rates.