Dataset partition

lxm-001 · September 15, 2021, 2:24pm

i want to question if the data augmentation photo can be used in the validation?

ptrblck · September 16, 2021, 6:10am

I’m not quite understanding the question. Could you explain your use case and issue a bit more, please?

gphilip · September 16, 2021, 6:34am

I am not @lxm-001 , but here is how I parsed their question:

There are some images that I got by applying various augmentation steps to images which are present in the training data. Is it OK to use such images (which I got by augmenting the “real” training data) as inputs to my validation step as well, or should validation be done entirely on “real” training data?

Perhaps they meant to ask something else, though!

ptrblck · September 16, 2021, 6:37am

Ah OK. In case that’s the question: I would try to keep the validation dataset as close to the test dataset (which one should only use once the model training is done) so that the validation dataset can indeed act as a proxy for the test set (i.e. the unseen samples).
In that case, no I would not add random data augmentation to the validation data, but would make sure to use the same (static) preprocessing such as resizing, normalization etc.