I want to train on CIFAR-10, suppose for 200 epochs.

This is my optimizer:

`optimizer = optim.Adam([x for x in model.parameters() if x.requires_grad], lr=0.001)`

I want to use OneCycleLR as scheduler. Now, according to the documentation, these are the parameters of OneCycleLR:

```
torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, three_phase=False, last_epoch=- 1, verbose=False)
```

I have seen that the most used are `max_lr`

, `epochs`

and `steps_per_epoch`

. The documentation says this:

`**max_lr** (float or list) – Upper learning rate boundaries in the cycle for each parameter group.`

`**epochs** (int) – The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None`

`**steps_per_epoch** (int) – The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None`

About `steps_per_epoch`

, I have seen in many github repo that it is used `steps_per_epoch=len(data_loader)`

, so if I have a batch size of 128, then this parameter it is equal to 128.

However I do not understand what are the other 2 parameters. If I want to train for 200 epochs, then `epochs=200`

? Or this is a parameter that runs the scheduler only for `epoch`

and then it restarts? For example, If I write epochs=10 inside the scheduler, but I train in total for 200, it is like 20 complete steps of the scheduler?

Then `max_lr`

I have seen people using a value greater than the lr of the optimizer and other people using a smaller value. I think that `max_lr`

must be greater than the lr (otherwise why it is called max ?)

However, if I print the learning rate epoch by epoch, it assumes strange values. For example, in this setting:

```
optimizer = optim.Adam([x for x in model.parameters() if x.requires_grad], lr=0.001)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr = 0.01, epochs=200, steps_per_epoch=128)
```

And this is the learning rate:

```
Epoch 1: TrL=1.7557, TrA=0.3846, VL=1.4136, VA=0.4917, TeL=1.4266, TeA=0.4852, LR=0.0004,
Epoch 2: TrL=1.3414, TrA=0.5123, VL=1.2347, VA=0.5615, TeL=1.2231, TeA=0.5614, LR=0.0004,
...
Epoch 118: TrL=0.0972, TrA=0.9655, VL=0.8445, VA=0.8161, TeL=0.8764, TeA=0.8081, LR=0.0005,
Epoch 119: TrL=0.0939, TrA=0.9677, VL=0.8443, VA=0.8166, TeL=0.9094, TeA=0.8128, LR=0.0005,
```

So it is incresing