I have a pretty complicated neural model that trains fine with learning rate = 0.2 with adadelta optimizer.
It takes around 6mins per epoch so setting learning rate too small, results in huge training time.
I recently added a parameter \lambda that ranges between 0 to 1 and is responsible for taking linear combination of 2 scores in the model. But I noticed that with learning rate = 0.2, the value of lambda becomes too large or too small due to big steps.
What do people do in such cases? How can we constrain a parameter to be between 0 and 1, and/or use different learning rates for such variables?