Overfitting Issue

I am estimating age from face images where the number of classes is 101(0…100). I made 3 subgroup where group1(0-15),group2(16-64) and group3(65+). The training & validation accuracy of group1 & 3 is 90% & 75% respectively, but group2 validation performance is about (30%) whereas training accuracy is 90%. Noted that, group2 has almost 40k training images and is almost balanced. I have tried with different pretrained model(resnet, squeezenet, inception resnet v1), scratch models(resnet,vgg16, ), attention models (Squeeze and Excitation, Block Attention module ), but none of them work fine. I have changed the hyper parameters like (weight decay, learning rate scheduler). Can anyone suggest to me which models can work or how I can solve this problem?
FYI: Group2 belongs 49 classes. I am estimating the real age through classification.

Hi Mahbubul!

If you are working on this as a semi-realistic problem (in contrast to
a pure learning exercise), I would strongly suggest that you treat this
as regression problem rather than a classification problem.

You would have your final layer be a Linear with out_features = 1,
and the output value would be the predicted age. Your target
(ground-truth annotation) would be the actual age. Then use something
like MSELoss as your loss criterion.

If the true age is 50 then classifying the image as class 49 or 51
is no better or worse than classifying it as 0 or 100. So in training
you would be throwing away the (very useful) information that your
prediction was almost exactly right (even though not spot on).

On the other hand, using a regression-type loss (such as MSELoss)
would heavily penalize a prediction of 0, but only modestly disfavor a
prediction of 49 in comparison with the correct result of 50 (and this
is logically what you want).

Best.

K. Frank

Thank you very much for your comprehensive reply.
Actually, after the classification, I calculate the expected value(age) and don’t think about only top1 prediction.
expected age = sum(predicted probability * corresponding label)

I will try according to your suggestion. Thank you.

Overfitting is much more general than what you describe. What you describe is specifically the case for some neural networks that keep fitting better and better on your training data even though later on it might only be noise that it is fitting to. In general, overfitting means that your model is learning not just signal but also noise which hurts generalization.