Trying to learn linear combination of parameters through adadelta and learning rate = 0.2


I have a pretty complicated neural model that trains fine with learning rate = 0.2 with adadelta optimizer.
It takes around 6mins per epoch so setting learning rate too small, results in huge training time.

I recently added a parameter \lambda that ranges between 0 to 1 and is responsible for taking linear combination of 2 scores in the model. But I noticed that with learning rate = 0.2, the value of lambda becomes too large or too small due to big steps.

What do people do in such cases? How can we constrain a parameter to be between 0 and 1, and/or use different learning rates for such variables?


I’m not sure, how lambda was implemented, but regarding different learning rates for parameters, have a look at the per-parameter option.
Could you post a small code snippet showing, what lambda is actually doing?