I’d have a quick question: In the ResNet paper, in section 4.2, an architecture is described for the CIFAR10 dataset. When they describe using a very deep ResNet, they write:

So we use 0.01 to warm up the training until the training error is below
80% (about 400 iterations), and then go back to 0.1 and continue training.

Therefore, I would like to have the following learning rate schedule:

LR = 10^{-2} for the first two epochs,

LR = 10^{-1} for the rest of the training.

However, I haven’t found a way to achieve exactly this (cyclical learning rates would only gradually increase the learning rate, not at once). Any help would be appreciated. (-:

First of all: Thanks! My apologies, but there’s one thing I forgot to mention in my original post, I’m afraid. Let me quote from the ResNet paper, section 4.2 again:

So we use 0.01 to warm up the training until the training error is below
80% (about 400 iterations), and then go back to 0.1 and continue training. The rest of the learning schedule is as done previously.

By “as done previously”, this is meant:

We start with a learning rate of 0.1, divide it by 10 at 32k and 48k iterations, and
terminate training at 64k iterations, […].

Basically, something like this should be our learning rate schedule:
lr = 0.01 if num_iters < 400,
lr = 0.1 if 400 <= num_iters < 32k,
lr = 0.01 if 32k <= num_iters < 48k,
lr = 0.001 if 48k <= num_iters < 60k.

I’d also be happy to use epochs instead of number of iterations, but I’m not sure how to achieve either. )-:

If I can ask a brief follow-up question: I had actually tried using the LambdaLR for this problem as well, but at the end, I didn’t really know how to implement it… I’d be happy if you showed me how! (-: