Optimizers: RProp vs SGD

Hello Fellows,

I was reading about the RMSProp when I got to know about an optimizer called RProp (check reference for RMSProp).

At first it seemed to be just like the SGD update. Then this statement came out

RProp algorithm does not work for mini-batches is because it violates the central idea behind stochastic gradient descent, when we have a small enough learning rate, it averages the gradients over successive mini-batches.

Hence, I concluded that SProp is different from SGD, other than a concept of mini-batch instead of batch - which introduces the solution to the stochastic realm, I didn’t see any other changes.

Then RProp is mentioned again by The Godfather in here making it sound like very similar to the regular gradient descendent (no mini-batch).

However, I got really confused when seeming RProp on the list of optimizers, given that we already have SGD there, and the mini-batch is not a parameters set at the optimizer.


So, can anyone share the difference between them?

That wiki just confuses readers by mentioning RProp without explaining it properly. Rprop is basically a heuristic to adjust learning rate based on gradient sign conformity. Newer methods use more sophisticated measures than sign (as a consequence - they work better with mini-batches).