Hello,
I couldn’t find any information that would provide answers to my questions, and I would appreciate advice on the following project.
I want to train a denoiser in PyTorch for X-ray CT reconstructions using an MSD (Mixed-Scale Dense) network. I have 24 volumetric scans, each with 2110 reconstructed 2D grayscale slices at 2560×1480, all acquired at the same detector settings. Scans come from the following experimental design:
-
Preload: 12 scans
-
Additional Load: 12 scans
So I effectively have 2 groups, each with 12 scans.
Questions:
-
If the final model will be used on all 24 scans, do I need to include all 24 of them in the full deep learning workflow (train/val/test)? What if I used only 8 scans for training (4 from each category) and 2 for validation and 2 for the final test (1 per category), and then applied trained weights to all 24 scans? More generally, how do we determine that the model has seen “enough” scan-level variability and that adding more scans is unlikely to improve generalisation?
-
If there is a way to determine the minimal dataset, then how to implement it together with hyperparameter optimisation and cross-validation for training-validation?
-
Is the following train/validation sampling strategy reasonable for slice/patch-based training?:
-
Training Sets: We randomly selected slices from each training set and randomly cropped each slice, so that training is performed on randomly selected patches in each epoch (e.g., a random 256 × 256 crop from 800 randomly selected slices per epoch).
-
Validation Sets: We select a random subset of slices and keep it fixed for all epochs (a tuple-like dataset), and then, for each chosen slice, we apply a fixed crop at a random centre point and freeze it for all epochs (tuple-like again).
Thank you for any advice and tips