Can pytorch provide some sample code on training ResNet?

I recently need to train ResNet50 and do some experiments, I know there are bunch of pretrained models on github, but I feel more interested on the training process(like how to preprocess, set the LR and so on)…

The dataset I am using is standard CIFAR100. On github, few repo claim that they can achieve top-1 error of 22%(i.e. accuracy 78%) on ResNet50.

I just simply cannot get good results by using ResNet, my best record is only 66%, I read the paper, they warm up the learning rate, which I also do…I feel ResNet is really hard to train and get good results.

So I am wondering…it’s ok if we dont’t have any pretrained model on these kinds of datasets, but is it possible that pytorch can provide some training scirpts or preprocessing scripts for us on these datasets…we can modify and train on our own.

You can refer to this, https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

this is something too basic…:smile:

Review the Bag of Tricks for Image Classification with Convolutional Neural Networks for some pointers.

Preprocessing: Zeros padding with value=4 and then randomly crop a 32x32 image. For normalization use mean=[0.491, 0.482, 0.447] and std=[0.247, 0.243, 0.261]. For data augmentation, use horizontal flip, maybe rotate. There are a lot of options here but these are the basic ones.

Learning Rate: Assuming you are starting with random weights. Initialization the weights using He init. Do warmup for first 5 epochs (linearly increase learning rate from 0 to final value with constant step rate). Then you can use the step decy method, where the learning rate is dropped by 10 after 30 epochs. Cosine annealing is also a good alternative.
I generally prefer cyclic learning.

Use batch size = 32. You can use higher but it would be sufficient. Use SGD+momentum as the optimizer. You can also test AdamW.

There are many more techniques like Multi-Sample Dropout, Mish activation function, LookAhead Optimizer with RAdam, multi sample data augmentations.

The implementation is easy, just start implementing one step and it would be fine.

2 Likes

Thanks for the suggestion.

I did the preprocessing you mention, also warm up the learning for first 5 epoch, I even tried warm up 20 epochs. Batch size for me is 128 since I feel it is more stable than 64 and 32. optimizer for me is also the same…SGD+momentum… for training scheme is step decay with factor 0.2 on steps of [60,120,160] with initial LR of 0.1

so …ya still cannot reach 70% and I tried this for few weeks already…:sweat_smile:

ya i should try to initialize the weight differently. see if it will help.

That will not help much, if you used He init.

it helped :blush: thanks for your remind. now the model works well.

1 Like

You meant when use “He init”, your model reached 70% from 66%?

not that dramatic…I update all my conda package to latest, and introduce some data augmentation and it reach 68.3%, then I try ‘He init’ it gives me 70.2% as top-1 accuracy, which is quite close to 22-28% top-1 error reported by almost all github repos…
the benefit of "he init’’ to me is like the model converges more stable, because my previous training the model is quite unstable and even with warm up the accuracy gets stuck…