Advice to _overfit_ a model

If you were to overfit training of Googlenet using Imagenet data, what would be your strategy?

I am trying to do this and all can I get are fairly well generalized results.

My data is the original ImageNet2012 1000-class set and I preprocessed it by taking 3 maxspect subcrops of each image and scaling them to 224x224. This is data I have used in the past with Caffe.

I took the script

github.com / pytorch/examples/blob/main/imagenet/main.py

and turned off augmentation via Transform because my files are preprocessed. I am manipulating the learning rate using StepLR or ReduceLROnPlateau.

Strategies I have tried:

StepLR - train for up to 100 epochs with combinations of step_size and gamma s.t. LR is ~10^-6 at 100th epoch. Gamma values like 0.1, 0.5, 0.67, 0.9, 0.925 and step_size derived from that.

ReduceLROnPlateau - this I am using with default parameters, and mode ‘min’. I’m only on my first attempt with this TBH.

With StepLR, I have used 500 images/class, 1000 images/class, 5000 images/class. With ReduceLROnPlateau I’m using 1000 images/class.

Either way I get extremely well behaved training. StepLR I can make overfit just a bit: val accuracy falls a bit off best result. ReduceLROnPlateau is like super extremely well behaved: val accuracy plateaus and stays there.

It is as if PyTorch is so well made that it really does not want to overfit a model! How do I defeat it?

Hi emerth!

I would train for a lot of epochs on a small dataset. Imagenet is pretty big, so I
would recommend that you start with a quite small subset of the Imagenet data.
Then, depending on how much you want to experiment, see how much you can
increase the size of your training subset while still getting obvious overfitting in a
reasonable amount of time.

If I read this correctly, you are only training for 100 epochs. For any reasonably large
model, this is not much training and I wouldn’t expect to see overfitting. Try training
(on a small subset) for a lot longer.

As an aside, although I haven’t looked at GoogLeNet, I would imagine it uses techniques
to make it somewhat resistant to overfitting (such as dropout).

Best.

K. Frank

Thanks K! I will give these approaches a try.