Before delving into the proper questions I would like to apologize if the questions are simple. All my previous deep learning works were made based on tensorflow so I am quite a novice at using pytorch.
While working on my master’s thesis, my coordinators recomended me to try using some of the examples made available by @MONAI as a starting point. My objective is to create a network that receives as input noisy medical images, with the present artifacts being a result of undersampling and movement mainly, and it then outputs the clear images. My coordinators were also very interested in me using a diffusion network to achieve this. With all of this in mind, I tried to adapt “2d_stable_diffusion_v2_super_resolution” network made available by the MONAI team.
However, I have come to face some issues that I am not being able to solve. If anyone would be able to help me I would be very thankfull.
- In the original example, the noisy images are created as a transformation of the ground truth images. For me this is not the case, since I have both noisy and ground truth images saved in different directories. I was importing the data as follows:
directory_und = os.fsencode(directory_tr_und)
directory_gt = os.fsencode(directory_tr_gt)
train_datalist =
for file in os.listdir(directory_gt):
filename_gt =‘/home/mamil/monai/GenerativeModels/data/multicoil_train_proc_128/’ + os.fsdecode(file)
filename_und =‘/home/mamil/monai/GenerativeModels/data/multicoil_train_proc_und_128/’ + os.fsdecode(file)
train_datalist+= [{“image”:filename_gt } , {“low_res_image”:filename_und }]
However, when using CacheDataset I would get an error. How could I perform the same transformations as the example, mainly the normalization of the figures, while having already both noisy and ground truth images imported, instead of creating the noisy images as a transformation of the ground truth images?
- The diffusion model that is trained is created with 4 input channels and 3 output channels. However, I do not understand why this is the case. I was thinking about not using the latent representation created by the autoencoder but instead just directly give the ground truth image as conditional information to the diffusion model. However, since I do not understand why there are 4 input channels, I am not sure what additional information I should give to the model.
Thanks to anyone that took their time to read this post.