Nice!
Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:
def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
"""Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
if epoch % lr_decay_epoch:
return optimizer
for param_group in optimizer.param_groups:
param_group['lr'] *= lr_decay
return optimizer
You don’t need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate