Data augmentation and augmented data accuracy

tecdun · April 28, 2024, 5:45pm

Hi,

I had quite a low data count so started my own data augmentation because it made sense in my case, and was quite easy to implement.

Without augmentation my CNN was extremely bad, even within the first epoch every batch had the same prediction (other than the first 3-5 batches). Adding data completely solved this issue (augmented the data 32 fold).

My overall accuracy is very high (over 90%) in training and testing.

My predictions on unknown data (data that was not used in training or testing, not sure if this has a name) is very high on original data, yet only at about 50% on augmented data. So my question is, is this normal (Original data accurate, augmented data not that accurate)?

In my case I will always be using original data to predict, so it doesn’t really matter, so its just curiousity on my part.