How to Sample Datasets for Training, Validation and Testing

moreshud · January 20, 2021, 2:56pm

I have collections of CT scan medical image datasets and I would like to sample the dataset for training, validation, and testing due to the computational challenge of not being able to use all the datasets. A large number of the images are with/without pathologies - GGO and CON while others only have either GGO or CON.

As I need to sample/split the whole datasets into training, validation, and testing, what approach(es) do you suggest in order to have an approximate statistic distribution of the validation/testing in training?

moreshud · January 23, 2021, 7:27am

@KFrank @ptrblck Your contributions will be highly appreciated.

MrPositron · January 23, 2021, 6:14pm

Check the random split function: torch.utils.data — PyTorch 1.7.0 documentation