Could data augmentation lead to such a situation?

gns24 · July 9, 2022, 4:23am

The accuracy of my model on the test set is about 63%, and the trained model is run on the training set, and the accuracy is 100%
So I thought of choosing a stronger data augmentation method, using the “albumentations”, and found that when the accuracy on test dataset is 47%, the accuracy on the training set is only 10%, I want to know that is this due to data augmentation? Does it treat the training set as unaugmented, i.e. images which have never been seen, like a test set that is 3 times larger?

cijerezg · July 10, 2022, 4:37am

In general, it’s very hard to tell. According to what you describe, your model can’t really train on the augmented dataset, which might indicate you augmented your data too much. You have to run different experiments to really figure out what’s going on. For example, you may start by augmenting your training dataset slightly and see what happens to both training and testing accuracies.

There are many posts about typical issues when training neural nets as well as tips, but at the end of the day, you are the one who has to run experiments to work out what’s going on. Especially, with something as odd as what you describe.

Loki_K · July 10, 2022, 11:08am

You have to run series of experiments to exactly findout what went wrong. Practical advice is to introduce little complextiy at once. .i.e. first try smaller no. of augmentations and make sure that they are almost close to the original data.

And just saying… you don’t augment the test set.

Btw I didn’t get what you are saying in the last question.