Do not scale lr in the first stage of lr_scheduler

>>> # Assuming optimizer uses lr = 0.5 for all groups
>>> # lr = 0.05     if epoch < 30
>>> # lr = 0.005    if 30 <= epoch < 80
>>> # lr = 0.0005   if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>>     scheduler.step()
>>>     train(...)
>>>     validate(...)

In the first stage, the lr is also scale by 0.1. This is a little bit weird.
It think it should be like:

>>> # Assuming optimizer uses lr = 0.5 for all groups
>>> # lr = 0.5     if epoch < 30  
>>> # lr = 0.05    if 30 <= epoch < 80
>>> # lr = 0.005   if epoch >= 80

When I train CIFAR10, lr=0.1 is used at the beginning, now I have to change it to 1 , so the lr_scheduler can scale it back to 0.1. Personally I think this is counter intuitive.

Oh, I find it works as I intended. :)
But the docs are very misleading…