There is a paper by Schaul et. al “Adaptive learning rates and…” https://arxiv.org/abs/1301.3764 (improvement upon “No More Pesky Learning Rates” https://arxiv.org/abs/1206.1106 Schaul et. al )
and python implemetation of the algorithm by the author:
The paper claims it is optimization algorithm with “linear complexity and is hyper-parameter free”. There is no need to set and therefore decay/adapt learning rate at all with it, sounds really good.
But no DL library seem to have it implemented, is there any reason for this?
Does it require too much memory or computation?