Ada delta torch implementation

This is an image from Torch’s documentation…!

Adadelta’s original paper completely removed learning rate and uses RMS( Parameter updates ) instead . But torch’s implementation has an additional learning rate added to it… Any particular reason why this was done?

I guess to give users more flexibility. The learning rate is set to 1.0 by default so should match the original implementation you are referring to.