Loading Dataset for UNet type network

din0328 · October 30, 2018, 1:36pm

I am working on some medical data with a very limited dataset.
I have some MRI as training images and CT as the corresponding images that I want from MRI images (this is going to be a GAN network to make fake CTs with U-Net based generator).
I am coding the generator part and wanted to train my UNet to see whether it is working okay.
But I am confused on how to correctly load images; my issue is this:

I have more MRI images than CT. I am thinking maybe for every epoch, assuming each patient to be a batch, I want to load the maximum possible images at random from each patient and then train my UNet. How can I do this using PyTorch data.Dataset ? Do I need to have 1:1 MRI and CT images to train the network?

Thank you in advance!

ptrblck · October 31, 2018, 12:28am

How imbalanced is your data and how many samples to you have?
I think you might apply some data augmentation (e.g. different level/window settings for the images) to artificially create some more samples. Also affine transformations might be a good strategy. Especially for MRI images I would think you might be brave in using augmentations techniques.

However, let’s dig a bit into the Dataset implementation. How is your data currently stored?
Do you have a folder for each patient with DICOM images in it? If so, do you have different sessions of just different slices?
How many patients, scans, slices do you have?
Based on these information, I’m certain we will come up with a good approach.

din0328 · October 31, 2018, 11:09am

Hello! Thank you for your response!
So I have my data as both DICOM and Analyze right now. But I am using a MATLAB script to convert Analyze to .jpg that I use for the UNET (This is so wasteful, sorry pretty new to this). Picking up data directly through DICOM/Analyze would be very helpful now and in the future for sure. I have a folder for each patient with all the DICOM images in it.

Currently I am testing with a few patients (10) and only a certain part of the brain to see whether I can make network work properly. This is like 120 images for each modality. I would say this very small data is not too imbalanced and I have like 10 extra MRI images compared to CT.
If this network works properly then I can use my bigger data.

Thanks for your help