What is the proper processing applying kfold?

skyunyoo · February 17, 2019, 1:31pm

I am confused about how to evaluate in stratified kfold CV. According to the documentation, performance is evaluated the average, but I do not know what the average means.

So far, I split the fold using stratified kfold CV, and training and validation for each fold. After k repetitions, the test proceeded only once.

My result as follow

1-fold

epoch 15/149

train Loss: 0.0264 Acc: 88.0000

valid Loss: 0.0220 Acc: 89.0000

Training process is stopped early....

2-fold

epoch 16/149

train Loss: 0.0201 Acc: 91.0000

valid Loss: 0.0124 Acc: 91.0000

Training process is stopped early....

. . .

k-fold

epoch 14/149

train Loss: 0.0239 Acc: 89.0000

valid Loss: 0.0254 Acc: 90.0000

Training process is stopped early....

test Acc: 93.0000

Confusion Matrix : 
([[145., 10.],
  [ 11., 139.]])

I wonder if do I have to test each fold?
Do I have to train by applying the calculated value of the epochs average?
When I averaged performance, do I average epochs and train again with that value?
After all the hyper parameters have been tuned, do I train again with kfold or do I train only once training data divided into train set and test set?

ptrblck · February 18, 2019, 1:26am

Usually the average of the validation accuracies (of the hold-out folds) will be calculated as the k-fold CV accuracy. The final model might be trained using the whole training data and the best hyperparameters found in your k-fold CV.

Have a look at @rasbt’s blog post and lecture notes on this topic. He might also correct me, if I said something bogus.

skyunyoo · February 18, 2019, 3:51am

Thank you for your reply.
But, I still have questions about the model selection section with K-fold cross validation…

“the average of the validation accuracies”
Does this mean Best valid acc or valid acc when stopped due to early stopping?

“The final model might be trained using the whole training data”
Does this mean that I need to train the entire training data only once, like a holdout, rather than k repeating train like KFold? Is there no need to k-split the train data in this part?

justusschock · February 18, 2019, 4:38am

The idea of k-fold is to show, that your model generalizes well on different subsets of your data. You can use it to have a relatively large trainsets. After you ensured that your model generalizes well, you might want to retrain the network on all your data (without k-fold splits) to get the best network possible by using all available training data (usually resulting in a more robust model).

skyunyoo · February 18, 2019, 5:07am

Now I understand the concept of kfold to some extent. Thank you!
To I ensured that my model generalizes well, which should I use ‘the average of valid acc when stopped due to early stopping’ or I use ‘the average of the best valid acc average’ ?

rasbt · February 18, 2019, 5:42am

Say you use 5-fold cross validations. Then, you have 5 iterations with 1 training and 1 validation fold in each iteration. So, in total, once you finished iterating, you have 5 validation accuracies. Here, the “the average of the validation accuracies” then means that these are averaged: sum(valid1 + valid2 … + valid5) / 5.

In a real-world application, you usually care about using your model in some production system. So, in this case, after you evaluated the performance of the model, it makes sense to use even more data for training, because based on theoretical assumptions, this can only improve the model.

So, yes, for production, you may re-train your model on the whole training set after evaluation. However, note that in DL, this is typically not as necessary as in traditional machine learning, because the training set is already relatively large to begin with.

skyunyoo · February 18, 2019, 5:51am

Thank you very much for the very detailed answer.
Thanks to you, I’m learning little by little!