Learning Rate Schedulers with multiple Google Cloud TPU cores

Hi All!
I wanted to use a CosineAnnealingWithRestarts schedulers on Google Cloud TPU cores, with distributed parallel training.
I couldn’t find a tutorial which shows how to do that.

Please help me out.

Hey @chhablanig

Do you have a working training script without using LR schedulers?

cc @vincentqb for optimizer question
cc @ailzhang for XLA question