Is a different learning rate per neuron or parameter possible?

adamgoodtime · January 17, 2023, 9:01pm

I have seen that it is possible to have a different learning rates for a layer or group of parameters using the code copied below.

optim.SGD([
                {'params': mylayer.weight},
                {'params': mylayer.bias, 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

Is it possible to have a different learning rate per parameter e.g.

optim.SGD([
                {'params': mylayer.weight, 'lr': [np.random.random() for i in range(len(mylayer.weight)]},
                {'params': mylayer.bias, 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

?

eqy · January 17, 2023, 11:35pm

You may want to take a look at the Adam optimizer, Adam — PyTorch 1.13 documentation, which in effect computes a per-weight learning rate, but I’m not sure it is common practice to specify/initialize this explicitly.