Weight/neuron based learning rate

razor · April 5, 2022, 9:00pm

I am trying to have individual learning rates for each neuron, that is, the learning rate becomes a matrix. How would anyone implement that for linear layers only?
Thanks!

mxahan · April 5, 2022, 9:13pm

You can do that by defining different optimizers for each of the neurons. This link may help you.

JuanFMontesinos · April 6, 2022, 12:36pm

I think it’s not currently possible (in an automatic way). LR affects parameter groups (this is, objects belonging to a nn.Module). In this case you want different learning rates for each element in a tensor which is not supported (I think).
It could be manually done defining each of the values of the tensor as a nn.parameter.

AlphaBetaGamma96 · April 6, 2022, 6:00pm

When you use optimizers like Adam or RMSprop? Don’t they effectively already do this? As they modify the raw gradient by some preconditioning which is dependent on each given weight?

razor · April 6, 2022, 7:52pm

@AlphaBetaGamma96 No, they are not as far as I have checked their source.
What happens when I try to add a matrix in the ‘lr’ parameter group (no matter if SGD, RMSProp etc), the optimizer will validate the value and fail before going further…

AlphaBetaGamma96 · April 6, 2022, 8:53pm

The learning rate of Adam or RMSprop is the ‘global’ learning rate the actual learning rate is determined by the exponential moving averages of each weight. This is what I mean, if you print out the exp. moving averages they should be the same shape as param and that gives you a learning rate based on each weight/bias of your network.