Ada delta torch implementation

gautamvarmadatla · April 2, 2024, 2:37am

This is an image from Torch’s documentation…!

Adadelta’s original paper completely removed learning rate and uses RMS( Parameter updates ) instead . But torch’s implementation has an additional learning rate added to it… Any particular reason why this was done?

ptrblck · April 2, 2024, 6:00pm

I guess to give users more flexibility. The learning rate is set to 1.0 by default so should match the original implementation you are referring to.