Learning rate not adjusting properly

This “zigzag” behavior would be created, if T_max is set to 1, i.e. if your train_loader returns a single batch as seen here:

nb_batches = 1
nb_epochs = 5
optimizer = torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=1.)
lrs = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=nb_batches)

l = []
for _ in range(nb_epochs):
    for _ in range(nb_batches):
        optimizer.step()
        lrs.step()
        l.append(lrs.get_last_lr())

l = np.array(l)
plt.plot(l.reshape(-1))

If you set nb_batches to a higher value, you’ll see a cosine wave.