I’m facing an overfitting problem, my model get very high accuracy on the training set ~99.8% accuracy
while on the validation set i’m getting worse result ~41% accuracy.
I tried adding more data (30K more images to the training set and validation set) and tried also data augmentation but the model just not improving on the validation set.
Thanks for your reply.
I added Dropout just before the last fc layer which didn’t help much, i’m training resnet50 on my dataset, my dataset contains images 600x600 of jewellery images.
data augmentation didn’t help and plus adding more data didn’t help.
i made sure that i’m using the same preprocessing on my training and my validation images.
not sure what else can i do, and why it’s not converging on my validation set
Look like sth wrong is in your pipeline. If you have > 50k images, over-fitting with Dropout should not be a big problem.
Try pretrained model and learn just last layer (so dont optimize rest of them, just pass model.fc.parameters() to optimizer)
If this also doesn’t help, maybe make again random split between training and validation images (maybe distribution of images in this two sets is too different)
Last part is training without Data-Augmentation, maybe turn it off and see what will happend.
Before the training accuracy reaches ~78% the validation accuracy is rising?
Are the validation predictions biased towards some classes the whole time?
The code looks fine, as long as the normalization is performed in the same way for the training and validation data.
Could you check the data stats (mean, std, min, max) for some samples taken from both DataLoaders?
Yup, the validation accuracy is rising and then it starting to drop giving me huge loss.
at the beginning when the validation accuracy rising the predictions are spread on the whole classes but when
the training accuracy rises the validation predictions are biased towards 1-2 classes (not the same class every epoch - for example in epoch 12 the validation predictions are biased towards classes 1 and 2, and on epoch 13 they are biased towards classes 3 and 7)
I checked my data stats, mean, std, min and max and the training stats are very similar to the validation stats that’s why i’m worried.
I even tried to do inference on my training set and it gave me bad results.
i’m training some of the batchnorm layers in resnet50 but my batch size isn’t small, maybe it has something to do with this?
Based on the screenshot you shared, your training accuracy and confusion matrix looks good.
What do you mean by “tried to do inference on my training set and it gave me bad results”?
Did you switch to eval and the results were also bad on the training set?
If your batch size isn’t small, it’s not problematic to use BatchNorm layers. The opposite could be problematic.
i switched the model to eval() and used my train data loader to test the model on the training set to see if my problem is occurring on the training set also (got bad results on my training set) - which it did so i guess it’s not a normalization problem (which i thought at the beginning)
i’m using batch_size=64 for the training and for the eval phase
I even made sure that the layers weights are not updating in the eval mode.
i did not make any change in the BatchNorm layers.
I tried freezing all the BatchNorm layers and train the model - same results
I even tried to freeze the BatchNorm layers and do a transfer learning - same results
Using more data should improve the ability to generalize. I would start by checking e.g. the preprocessing pipeline and make sure that e.g. the normalization is applied in the training and validation use case. A quick test might be to use the training dataset in the validation loop and check it’s performance.
I’m unsure if the model is only overfitting after the first epoch, since epoch0 already shows a worse validation accuracy than the training one.