I am estimating age from face images where the number of classes is 101(0…100). I made 3 subgroup where group1(0-15),group2(16-64) and group3(65+). The training & validation accuracy of group1 & 3 is 90% & 75% respectively, but group2 validation performance is about (30%) whereas training accuracy is 90%. Noted that, group2 has almost 40k training images and is almost balanced. I have tried with different pretrained model(resnet, squeezenet, inception resnet v1), scratch models(resnet,vgg16, ), attention models (Squeeze and Excitation, Block Attention module ), but none of them work fine. I have changed the hyper parameters like (weight decay, learning rate scheduler). Can anyone suggest to me which models can work or how I can solve this problem?

FYI: Group2 belongs 49 classes. I am estimating the real age through classification.

Hi Mahbubul!

If you are working on this as a semi-realistic problem (in contrast to

a pure learning exercise), I would strongly suggest that you treat this

as regression problem rather than a classification problem.

You would have your final layer be a `Linear`

with `out_features = 1`

,

and the output value would be the predicted age. Your `target`

(ground-truth annotation) would be the actual age. Then use something

like `MSELoss`

as your loss criterion.

If the true age is `50`

then classifying the image as class `49`

or `51`

is no better or worse than classifying it as `0`

or `100`

. So in training

you would be throwing away the (very useful) information that your

prediction was almost exactly right (even though not spot on).

On the other hand, using a regression-type loss (such as `MSELoss`

)

would heavily penalize a prediction of `0`

, but only modestly disfavor a

prediction of `49`

in comparison with the correct result of `50`

(and this

is logically what you want).

Best.

K. Frank

Thank you very much for your comprehensive reply.

Actually, after the classification, I calculate the expected value(age) and don’t think about only top1 prediction.

expected age = sum(predicted probability * corresponding label)

I will try according to your suggestion. Thank you.

Overfitting is much more general than what you describe. What you describe is specifically the case for some neural networks that keep fitting better and better on your training data even though later on it might only be noise that it is fitting to. In general, overfitting means that your model is learning not just signal but also noise which hurts generalization.