This “zigzag” behavior would be created, if T_max
is set to 1, i.e. if your train_loader
returns a single batch as seen here:
nb_batches = 1
nb_epochs = 5
optimizer = torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=1.)
lrs = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=nb_batches)
l = []
for _ in range(nb_epochs):
for _ in range(nb_batches):
optimizer.step()
lrs.step()
l.append(lrs.get_last_lr())
l = np.array(l)
plt.plot(l.reshape(-1))
If you set nb_batches
to a higher value, you’ll see a cosine wave.