Adaptive optimizer vs SGD (need for speed)

Andrei_Cristea · June 5, 2022, 1:13pm

SGD supports per-parameter learning rates, you just have to pass them individually when you instantiate the optimizer.

Here’s another relevant thread on tips for doing this in a convenient way.