How to avoid overfitting in pytorch?

will_soon · May 5, 2018, 1:15pm

I have 20,580 images about dogs downloaded from http://vision.stanford.edu/aditya86/ImageNetDogs/
In the dataset, I have 120 classes, and each class has images between 120 and 180.And I use 16,418 (about 80% of the dataset) as the train dataset.

During my training, the train accuracy could be 0.96, but the val accuracy is only 0.7(after 50 epochs). It is obviously that overfitting happened.

So I tried seveal ways to improve the problem:

Reduce the size of my network.I switched from using a pretrained(Imagenet) Resnet50 to Resnet18, but it improved a little.
Data augmentation.For the train dataset, I used random horizontal flip,random brightness,random shift and random ratation. In the end, I resized the image to 384384,but it does not help too much with overfitting(Because the size of images are different,some are 180120,some are 500*400, and so on).
Add weight decay.I tried 1e-5,5e-4,1e-4,1e-3 weight_decay,and 1e-5 and 1e-4 could improve a little.The train accuracy is 0.85,and the val accuracy is 0.65(after 7 epochs).

I am confused about how to prevent overfitting. I even doubt if the dataset is suitable for classfication.

Hope your advice.

Thanks.

peter · May 5, 2018, 10:35pm

I guess this is more of a generic answer then specific to PyTorch, but two things come to mind:

If you reuse a pre-trained network, ensure that when you train on your images you “freeze” most of the earlier layers. So only train the last few layers (typically the fully connected layers) and don’t update the weights of the earlier layers. Possible you are already doing this, but if not I would suggest to try this first.
I like to use dropouts as a way to prevent overfitting. The default dropout value of 0.5 is in my personal experience a good starting point. You could add them at the end between the fully connected layers.

will_soon · May 6, 2018, 1:11am

Thanks for your advice.

I just use the pre-trained weight to initialize the new network , and retrain the network instead of freeze most of the earlier layers.
By now, I haven’t applied dropout in the network.

Thanks for your advice again, I will update my result after the two tries.

will_soon · May 6, 2018, 6:18am

Yes, you are right.

I tried the two ways that you advised, the results are improved a lot.

Now I have two results:

I freeze all the earlier layers, only update the last fully connected layers(Resnet 18). No dropout.Train accuracy:0.80; Val accuracy:0.76
I freeze all the earlier layers, only update the last fully connected layers(Resnet 34). And I also add a dropout layer (p = 0.5) before the fullly connected layers. Train accuracy:0.90; Val accuracy:0.81

The two are all better than my first result. Thanks for your advice.

peter · May 6, 2018, 6:39am

Good to hear that it improved the accuracy score of the validation set (and I assume reduced the training time significantly since the back-propagation involves fewer learnable parameters).

basketballandlearn · August 20, 2018, 8:18am

may i see your resnet18 code? i need it so much , because my resnet18 did not work well.