I was trying to replicate the same result of resent 110 in cifar 10 as in Deep Residual Learning for Image Recognition. It says something like this for resnet having less than 110 layers
We start with a learning rate of 0.1, divide it by 10 at 32k and 48k iterations, and terminate training at 64k iterations, which is determined on a 45k/5k train/val split.
and for a resnet with 110 layers it says
In this case, we find that the initial learning rate of 0.1 is slightly too large to start converging. So we use 0.01 to warm up the training until the training error is below 80% (about 400 iterations), and then go back to 0.1 and con- tinue training. The rest of the learning schedule is as done previously.
How can I change learning rate according to the error ??
My guess is that they used a warming-up learning rate 0.01 only for the first 400 iterations, and then the training error was 80% as a result.
For warm-up strategy, torchvision’s example code may help.
FYI, I reproduced the accuracy of ResNet-110 for CIFAR-10 even without the warm-up strategy, and the code + trained model weights are publicly available as part of torchdistill.
See this page for the reproduced result. Google Colab examples to reproduce the trends are also available here.
@yoshitomo-matsubara thanks very much. I have one more doubt, it would really help me if you can clarify it.
The results in the official ResNet paper report the score of the model on cifar 10 in a column named error%. I am not sure how that score is calculated. My current guess is that it is the mean of of top1 accuracy += std.div of the top1 accuracy. if not so, can you guide me on how to calculate the value in the ‘error%’ column.
Hi @abhinav_hari ,
I think the error you mentioned is standard deviation (over 5 runs) as explained in the table caption
Table 6. Classification error on the CIFAR-10 test set. All methods are with data augmentation. For ResNet-110, we run it 5 times and show “best (mean±std)” as in .