Split images based on patient folders for cross validation

eyk · April 2, 2020, 8:47pm

Hi,
I need to split my images to validation and train set in cross validation, in a way that images of each patient be either in train or validation. each patient has about 8-11 images.
root
|___patien_1 im1
| | __ im2 …
|
|___patient_2 im1
| | __ im2 …
|
.
.
.

how I can split images based on patient folders?

ptrblck · April 2, 2020, 9:15pm

If you have all target values for all images, you could create a corresponding group for each patient and use sklearn.model_selection.GroupShuffleSplit.

eyk · April 3, 2020, 1:01am

this is a binary classification task [0,1]. but how I can make the group of patients?

ptrblck · April 3, 2020, 3:02am

Each patient would get an index (similar to a target) and you could pass it to the GroupShuffleSplit to get the indices for the train and validation split based on the patients.

E.g. if patient_1 has 1 image and patient_2 2 images, you could pass the groups as: [0, 1, 1].

eyk · April 3, 2020, 8:21am

Thanks for explanation!
I have 300 patients, this means that my group range should be 0-299? or I need to make the groups based on target 0-1 ?

ptrblck · April 3, 2020, 10:36pm

Each patient should belong to a unique group, so the range should be [0, 299].