Greetings to all,
I have a question regarding modifying learning rates for each weight in a module.
Specifically, having the update rule:
W = W + lr * grad(W),
how can I make “lr” into a matrix of same dimensions as W, and perform a element-wise multiplication with the gradient? Can I do that to one of the Optimizer implementations in PyTorch?
I don’t know if this question makes sense.