Adam can adapt its learning rate by the gradient updating. I think we may not need the learning rate scheduler.
However, I worry that if with that kind of learning rate scheduler in Adam can jump out of the local minimal or get away from the local minimal
Adam can substantially benefit from a scheduled learning rate multiplier. The fact that Adam
is an adaptive gradient algorithm and as such adapts the learning rate for each parameter
does not rule out the possibility to substantially improve its performance by using a global
learning rate multiplier, scheduled, e.g., by cosine annealing.