```
>>> # Assuming optimizer uses lr = 0.5 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 80
>>> # lr = 0.0005 if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>> scheduler.step()
>>> train(...)
>>> validate(...)
```

In the first stage, the `lr`

is also scale by `0.1`

. This is a little bit weird.

It think it should be like:

```
>>> # Assuming optimizer uses lr = 0.5 for all groups
>>> # lr = 0.5 if epoch < 30
>>> # lr = 0.05 if 30 <= epoch < 80
>>> # lr = 0.005 if epoch >= 80
```

When I train CIFAR10, `lr=0.1`

is used at the beginning, now I have to change it to `1`

, so the `lr_scheduler`

can scale it back to `0.1`

. Personally I think this is counter intuitive.