What is the best way to apply k-fold cross validation in CNN?

ptrblck · June 3, 2020, 5:36am

This post seems to give an additional example of how to use cross validation in skorch.

Alternatively you could also directly use sklearn.model_selection methods to create indices for the splits and recreate the datasets via Subsets in each iteration.

avh77 · June 4, 2020, 3:24pm

Thank you very much!!!

avh77 · June 4, 2020, 3:31pm

Following the topic this link could help a lot to anyone interested

M_nh_Tu_Vu · July 7, 2020, 7:58pm

For me, I think a simple loop can do the job, no need to use any other kind of library,

Here is the pseudo code for illustration

results = []
for fold in range(total_fold):
    train_set, test_set = split_dataset(your_dataset, fold)
    model = MyModel()
    results.append(your_training_function(model, train_set, test_set)
print(mean(results))

As we create a new model inside the loop, so there is no need to reset its parameters as well.

David_Alford · July 8, 2020, 10:43pm

This was useful to better understanding it. All it really does is loop over the dataset. If the dataset is random_split already then we can just loop over it a few times. I am sure it adds a little more bias than the correct cross-validation but definitely helps the understanding!

David

skipperOVO · October 16, 2020, 9:29am

I have wrote a function to implement crossvalidation ,it may be help you.

Ali_Haider · November 26, 2020, 5:59pm

I will just suggest to use a Subset(dataset,index) wrapper to index the dataset loaded using an arranged way. And specify the indexes using some predefined criteria like
“if you have 50 samples and using 5 fold validation then for first case use first 40 indexes for training and use rest for testing”
We can used above criteria to train and test on the desired range of sample calculated for specific fold. save results. use them later.
This method is good when we have longer training time and we can face some interruption during training.
I will request for correction if I am wrong

Manjari_Ganapathy · December 3, 2020, 1:56am

Hello ptrblck… Do you have any sample code that split the data into train, test and val and also use stratifedKfold?

ptrblck · December 3, 2020, 5:03am

You could use the code example from here, which shows how to use sklearn.model_selection.StratifiedKFold.
Once inside the “index loop”, you could create torch.utils.data.Subsets to create the datasets using the drawn indices.
Let me know, if you need more information.

Najeh_Nafti · September 23, 2021, 11:59am

And how the split_dataset function is defined?