How to handle overfitting properly

NextDoorToHell · October 4, 2022, 11:39pm

I am training a CNN regression model with a 34560 size dataset. I have already got a training error rate less than 5%, but the validation error rate is over 60%. It seems to be an overfitting problem. And I tried the four ways to solve the problem, but none of them works well:

Increase the dataset size
Reduce the model complexity
Add a dropout layer before the output layer
Use L2 regularization / weight decay
Probably I did not use them in the right way. Can someone tell some details of these methods? Or are there any other ways to solve the overfitting problem?

J_Johnson · October 5, 2022, 2:24am

If you always send the training data into the model the same way, you’ll likely have overfitting.

Augmentations are a way to “extend” your dataset and build a more robust model. This is basically done by altering the images on the fly with crops, filters, flips, rotations, etc.

See here:
https://pytorch.org/vision/main/transforms.html

And here:

NextDoorToHell · October 5, 2022, 10:08am

Yes, data augmentations work well for classification. But my task is a regression one, and I never tried augmentations on regression. Could you explain more about data augmentation on regression tasks?

J_Johnson · October 5, 2022, 11:47am

The appropriate augmentations to use will be determined by the target of the regression model. For example, if your target was to find the distance measured between two points, rotation, hue, brightness, etc. would be appropriate to augment. But you probably shouldn’t crop or sheer.

The question ought to be, will this type of augmentation materially change the target? Nevertheless, it’s still a good idea to incorporate appropriate forms of augmentation. It prevents your model from memorizing irrelevant features in the data.

NextDoorToHell · October 8, 2022, 2:57am

My task is to simulate an optical grating. The inputs are 8 coefficients and a 128x128 binary image, and the outputs are 80 numbers.
First, I extend the 8 coefficients into a 128x128 matrix. Then I apply CNN and average pooling twice on both the matrix and the binary image, and concatenate both to get 64x32x32 tensor. After 5 times CNN and average pooling on the tensor, I flatten the tensor and apply a FC layer to get 80 outputs. This is basically what my model is like.
So what kinds of data augmentation do you recommend for my model?

J_Johnson · October 8, 2022, 4:33am

Ask yourself this:

If I performed ______ augmentation on the image, would I still be able to determine what the 80 outputs(targets) should be(with sufficient training)?
And would the targets still be the same?

Take rotation, for example. Would rotating the image n degrees alter what the targets should be? If not, then this would be a good target augmentation. Rotations alone can multiply your dataset size by 360.

Then go through each augmentation and determine if it would satisfy those conditions.