How to train super resolution DL CNN-model with real MRI high and low resolution image pairs

I’m not a native speaker. There may be some grammar or vocabulary misuse. Sorry about that.

I’m doing a master’s degree and researching on MR human brain image super-resolution with DL model. After surveying SR papers, most of them obtain LR images by down-sampling(bicubic or k-space degrade). So I design a supervised 2D CNN-based model(like EDSR) and try to train my model with simulated MR images pair.
Dataset parameters are shown below.
T1 images,matrix size:HR(352,448,448) LR(352,224,224), resolution(HR):0.5x0.5x0.5, n=35.
31 cases are used for training and validation(9:1) and 4 cases are used for testing.
Besides satisfying result, I also have some questions.
1.Because intensity of MR images is not like gray scale(0~255), its maximum is uncertain. I’m confused about normalization. While training, is it better to normalize each 2D image to (0,1) with their own min and max or with min and max of whole volume? Can I train with the former and test with the latter?

Moreover, my adviser asks me to reconstruct real LR image into its corresponding HR image. So I acquire several real LR & HR image pairs and reconstruct the LR images with model trained with simulated dataset. The result is not satisfying(shown below). It seems that my model learns features that reconstruct simulated LR images well but cause artifacts in real SR images.
SR real image by model trained with simulated dataset
2.Is it really possible to train a model with simulated images pair and reconstruct real MR images? If so, what do I possibly do wrong?

Then I wonder if it reconstructs real LR images better with model trained with real LR&HR image pairs. So I acquire 16 cases with the same parameters of the HR images just mentioned to train my model.
14 are used for training and validation. 2 are used for testing.
In order to coregister LR and HR better, I skull-strip these cases with FreeSurfer before applying them to SPM coregister function. However, I get very blurry result shown below.
SR real image by model trained with real image pair dataset

3.If the answer of Q.2 is negative, it means collecting real dataset is the correct direction. What should i do to improve my model or data? Just collect more data? or there’re some tips that I need to pay attention to?
4.I implement my model with Pytorch and Google Colab. I load data into memory and set pin_memory=true, n_worker=2 to accelerate training process. However, it limits my dataset size because Colab only provides 12G memory. Are there other methods to reduce memory cost or accelerating method?

Thanks for reading all my questions. If there’s any thing I didn’t mention that help you to solve my problems, tell me and I’ll update ASAP.

  1. Even if you do not know the minimum and maximum you can try common normalizations like subtracting the mean and dividing by the standard deviation. I’m not sure if the particular type of normalization is important, but you must match the normalization between the training and test set. That means using the mean and variance computed on the training set even if it is not exactly the mean and standard deviation of the test set.

  2. This seems like a tricky problem overall but you might consider adding more data augmentations or noise to your simulated dataset to make your model more robust. Are your simulated images LR simply degraded HR images? You might consider some more sophisticated augmentations from libraries such as albumentations.

  3. If your current amount of real data is small, then data augmentations should help a bit. However, a sanity check you can do at this stage is see if your model can overfit your training data and perfectly reconstruct a small training set.

Thanks for replying. I rotate and filp my data to generate other 7 kinds of data. After I search on albumentations, I see that it provides many kinds of augmentations I don’t familiar with. Is there any method you recommend to apply to my MR dataset?

Also, I recently try to degraded my HR images in frequency domain by cutting off three quarters of high frequency region and get better results than before(guassian blur+bicubic down-sample). I think maybe features of LR images generated this way are more similar to real LR images.

I check if my model overfit my training data by calculating average psnr on my validation data every half of epoch. It saves the model when psnr is higher than the highest psnr before it. And I will lower the learning rate if the average psnr doesn’t get higher for maybe 3 or 5 epoch. Then it early stops training when val psnr doesn’t get higher for maybe 10 epoch. Finally I get the model with best psnr score on validation data. This sounds reasonable to me but I wonder if this is a proper way to manage learning rate and early stop to prevent overfitting.