Hello everyone. I came across some code that was written like this (I’m using pseudocode for easier tracking):
for iter, image :
warmup.step()
loss.backward()
if iter % 16 == 0 :
optimizer.step()
From my understanding, performing warmup at each iteration like this doesn’t have any effect because warmup changes the learning rate 16 times before calling optimizer.step(), but the optimizer only uses the last changed learning rate, is that correct?