CosineAnnealingWarmRestarts t_0

Hi Folks,

I just confirmed my understanding related to T_0 argument.
Let’s say I have 500 epochs and data loader len 97

loader_data_size = 97
for epoch in epochs:
   self.state.epoch  = epoch # in my case it different place so I track epoch in state.
    for batch_idx, batch in enumerate(self._train_loader):    
        # I took same calculation from example. 
        next_step = self.state.epoch + batch_idx / loader_data_size
        scheduler.step(next_step)

It will anneal every batch at the end of the batch ( since I set T_0 = 97, it will restart back) if I understand semantics. If that is the case, it implies that every example in batch ( the last example will always get the lowest LR). If my understanding is correct, do I need to shift the cosine function somehow and slide it to fix the behavior? or it will slide by itself. i.e for epoch 0 lower LR at batch_idx = 96 at epoch=1 at batch_idx = 96 LR going to be the same at in epoch=0 or it will slide ?

Thank you,
Mus

1 Like