Different learning rate for different type of module

How can I have all PReLU to have a learning rate that is 0.1 times the used by the other layer?

1 Like

On top of my head, there are two options:

  1. write your own lr scheduler (see examples here:https://github.com/pytorch/pytorch/blob/master/torch/optim/lr_scheduler.py)
  2. use different optimizers for different parts of your network.

You can check out optim-per-parameter-options, where there is a small example how to set different learning rates for your layers.

Optimizer s also support specifying per-parameter options. To do this, instead of passing an iterable of Variable s, pass in an iterable of dict s. Each of them will define a separate parameter group, and should contain a params key, containing a list of parameters belonging to it. Other keys should match the keyword arguments accepted by the optimizers, and will be used as optimization options for this group.

{‘params’: model.base.parameters()},
{‘params’: model.classifier.parameters(), ‘lr’: 1e-3}
], lr=1e-2, momentum=0.9)


The problema is my customized module is used in many places in the model… They are spread… Not thogheter… But they have the same type… Like use PReLU in diferent places of the model…

I just remembered this option and came here to comment. Thanks for pointing that out before me @ptrbick!

Oh so you want its parameter to have 0.1 of the original gradient no matter where it is used? How about adding a automatic backward hook on that module’s parameter? You can do that in the constructor even.

Could you please give a piece of example code or link?

class DDReLU(nn.Module):
def __init__(self):
    super(DDReLU, self).__init__()
    self.threshold = nn.Parameter(torch.rand(1), requires_grad=True)
    self.register_backward_hook(lambda grad: grad * 0.1)
    self.ReLU = nn.ReLU(True)

def forward(self, x):
    return self.ReLU(x) + self.threshold