I’m trying to implement both learning rate warmup and a learning rate schedule within my training loop.

I’m currently using this for learning rate warmup, specifically the `LinearWarmup()`

. So this simply ramps up from `0`

to `max_lr`

over a given number of steps.

I’m also wanting to use `CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)`

as my lr scheduler.

The challenge is that I’m wanting to use a rather long warm up period, without using an initially high value of `T_0`

. Is there a way I can the LR scheduler to become active over after X number of steps have passed? A simplified version of my code is below. It may also be beneficial to have the LR constant for a number of steps before the scheduled decay begins.

If my initial value of T_0 is not high enough, the LR value during the warm up period follows the pattern of the LR schedule. I’m not sure if this is something that is desirable?

```
import torch
import pytorch_warmup as warmup
optimizer = torch.optim.AdamW(params, lr=lr)
num_steps = len(dataloader) * num_epochs
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2)
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period=1000)
iters = len(dataloader)
for epoch in range(1,num_epochs+1):
for idx, batch in enumerate(dataloader):
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
with warmup_scheduler.dampening():
lr_scheduler.step(epoch + idx / iters)
```