Learning rate as a matrix

Greetings to all,
I have a question regarding modifying learning rates for each weight in a module.
Specifically, having the update rule:
W = W + lr * grad(W),
how can I make “lr” into a matrix of same dimensions as W, and perform a element-wise multiplication with the gradient? Can I do that to one of the Optimizer implementations in PyTorch?

I don’t know if this question makes sense.


I am afraid the provided optimzers won’t allow that.
I think the simplest is to reimplement whichever optimizer you want to use based on the original implementation like the one for sgd here.
For sgd it should be fairly simple, but I am not sure how adam is supposed to behave in that case.