Adaptive optimizer vs SGD (need for speed)

SGD supports per-parameter learning rates, you just have to pass them individually when you instantiate the optimizer.

Here’s another relevant thread on tips for doing this in a convenient way.