Warmup learning rate and optimizer

Hello everyone. I came across some code that was written like this (I’m using pseudocode for easier tracking):

for iter, image :
   warmup.step() 
   loss.backward()
   if iter % 16 == 0 :
     optimizer.step()

From my understanding, performing warmup at each iteration like this doesn’t have any effect because warmup changes the learning rate 16 times before calling optimizer.step(), but the optimizer only uses the last changed learning rate, is that correct?

The optimizer uses the learning rate of each group in optimizer.param_groups. If warmup.step() changes it, then I think your understanding is correct.

1 Like