I am interested in your view or best practices on training, validation & test strategies!
For a research project, I am working on using a CNN for defect detection on images of weld beads, using a dataset containing about 500 images. To do so, I am currently testing different models (e.g. ResNet18-50), as well as data augmentation and transfer learning techniques.
After having experimented for a while, I am wondering which way of training/validation/testing is best suited to provide an accurate measurement of performance while at the same time keeping the computational expensiveness at a reasonable level, considering I want to test as many models etc.as possible.
I guess it would be reasonable performing some sort of cross validation (cv), especially since the dataset is so small. However, after conducting some research, I could not find a clear best way to apply this and it seems to me that different people follow different strategies.
Do you use the validation set only to find the best performing epoch/weights while training and then directly test every model on the test set (and repeat this k times as a k-fold-cv)?
Or do you find the best performing model by comparing the average validation set accuracies of all models (with k-runs per model) and then check its accuracy on the testset? If so, which exact weigths do you load for testing or would one perform another cv for the best model to determine the final test accuracy?
Would it be an option to perform multiple consecutive training-validation-test runs for each model, while before each run shuffling the dataset and splitting it in “new” training-, validation- & testsets to determine an average test accuracy (like monte-carlo-cv)?
Since I could not find some sort of scientific consensus on this topic, I would be very happy to hear some of your opinions or experiences on this.
Thanks a lot!