Learning rate as a matrix

calavera · October 19, 2018, 4:11pm

Greetings to all,
I have a question regarding modifying learning rates for each weight in a module.
Specifically, having the update rule:
W = W + lr * grad(W),
how can I make “lr” into a matrix of same dimensions as W, and perform a element-wise multiplication with the gradient? Can I do that to one of the Optimizer implementations in PyTorch?

I don’t know if this question makes sense.

albanD · October 19, 2018, 4:19pm

Hi,

I am afraid the provided optimzers won’t allow that.
I think the simplest is to reimplement whichever optimizer you want to use based on the original implementation like the one for sgd here.
For sgd it should be fairly simple, but I am not sure how adam is supposed to behave in that case.