Creating 3D Dataset/DataLoader with patches

Hi,
1. yes, I haven’t used the reshape operation at that time.
2. I am curious why have you included np.random.randn ?
The CT and MR patch centers come from the exact same location from two images. I mean the patches must be paired.
Won’t np.random.randn change the correspondence of patches in CT and MR based on random.seed()?
3. I have used reshape in the following way:

patches_MR = np.array(patches_MR)
patches_MR = patches_MR.reshape(-1,32,32,32) #todo: which formats
patches_CT = np.array(patches_CT)
patches_CT = patches_CT.reshape(-1,32,32,32)
# patches_MR = np.array(patches_MR) #type(patches_MR) -->  numpy.ndarray

train_dataset = Dataset(patches_MR,patches_CT)
train_loader = data.DataLoader(train_dataset,batch_size=5,shuffle=True)
print('train directory has {} samples'.format(len(train_dataset)))
# train directory has 9000 samples


for i,sample in enumerate(train_loader):
    mr = sample[0].float().to(device) 
    print('input sample shape of train_loader: {}'.format(mr.shape))
    #input sample shape of train_loader: torch.Size([5, 1, 32, 32, 32])
#    input sample shape of train_loader: torch.Size([5, 1, 32, 32, 32])
#    input sample shape of train_loader: torch.Size([5, 1, 32, 32, 32])
.
.
.

Now it works as you have suggested I guess.

4. Its funny coz I also don’t know how it worked at that time:

# np.shape(train_dataset) 
# Out[242]: (18, 2, 500, 32, 32, 32)

but later it showed errors and I have corrected it as you have mentioned.

I have the following question:

5. Is it necessary to use the class Dataset and torch.utils.data.DataLoader to input the data to the network??
I have seen PyTorch codes that do not use these two either.

  • I just created some dummy data for the example. You should of course use your paired inputs, not the garbage data. :wink:

  • It’s probably not really necessary, but wrapping the data in a Dataset allows you e.g. to add some transformations later. Using a DataLoader on the other hand allows to create batches easily, shuffle the data (the pairs will still be valid), use multiple workers etc. It’s just a clean approach in my opinion, but since you already have the data in memory, you could also just index it manually.

Haha, I should’ve guessed that.
I will run the experiments in some network and let’s hope that it works.

Thanks, appreciate it.

hello banikr,
I have question I read your post and all the conversation. Actually I am making data loader for MRI images collected from ADNI. I loaded a single image from training folder now I want to load all the MRI images in a iteration way and than apply some neural network for classification purposes.
Please help me that how you load your whole MRI data from the directory
I have 900 MRI images in three different folder i.e Alzheimer have three main classes
CN, MCI, AD so I want to load all the data from each folder but how I to do?
Further more I read 1000 post and tutorial but I couldn’t get an idea to implement as I am not much expert in pytorch and 3D data handling.
I am using following IDE and libraires
IDE- Spyder
using Pytorch and tensorflow
python 3.7
Thanks in advance

Hey,
I did not load the whole MRI image to the data loader. The MR images I am using are of size 172x220x156 so it will exceed the memory Cuda cores can load.
For image synthesis, I created patches of 10000 per image and augmented the data. In your case of classification, it should be similar.
Then I am using regression analysis/prediction from MR image which will not work in patch-based training…so I subsampled the image to reduce the number of voxel size per image.
Then the PyTorch data loader should work fine.
Let me know if you need more help.
I would suggest you use Jupyter notebook or Pycharm IDE for coding. I find them easy to use and feasible. Use python 3.6 if possible, not all the libraries support 3.7 yet.
Since it is Pytorch help forum I would ask you to stick to it, eh… :wink:

1 Like

How to make use of the torch.utils.data.Dataset and torch.utils.data.DataLoader on your own data (not just the torchvision.datasets )?

Is there a way to use the inbuilt DataLoaders which they use on TorchVisionDatasets to be used on any dataset?

Yes, that’s possible and you can write your own Dataset implementation and just pass it to a DataLoader.
Have a look at this tutorial for an example.

Thanks banikr for your valuable reply
that my objective that I passed the whole MR image into my network and my network just classify in their respective classes. But now i knows about that no any method which take directly a 3D image as input file and than some processing by CNN or whatever the network is and than classify into their classes, every one use patch wise input into their network. Furthermore, in my project I used all the ADNI data for that I don’t use augmentation but directly processed all my MR images.
yes you’re right that regression analysis will not help in this regard, you have to use neural network for that purpose as suggestion
will now I am very use to in spyder IDE as its on top of most using IDE now days.
if you don’t have issue could please show some of code snippet which from dataloading for guidance.
Thanks for reading such a long reply:innocent::innocent:

Hi baniker How to convert a single or bunch of MRI image having .nii format into patches?
also please guide me how to subsample same image using pytorch?

Hi @ptrblck
I have a question about unfold. I want to extract patches from my dataset. I use medicaltorch libarary to loading data. If i use unfold , It has error, I think when I load data by using dataloader, it doesn’t access to data. what can I do?
Thanks.

ROOT_DIR= “/home/elahe/data/dataset/”
img_list = os.listdir(os.path.join(ROOT_DIR,‘trainnii’))
label_list = os.listdir(os.path.join(ROOT_DIR,‘labelsnii’))
print(img_list[1])
img_list= (i.unfold(2, 32, 32).unfold(1, 32, 32).unfold(0, 32, 32) for i in img_list)
label_list = (i.unfold(2, 32, 32).unfold(1, 32, 32).unfold(0, 32, 32) for i in label_list)

filename_pairs = [(os.path.join(ROOT_DIR,‘trainnii’,x),os.path.join(ROOT_DIR,‘labelsnii’,y)) for x,y in zip(img_list,label_list)]
print(filename_pairs)
train_transform = transforms.Compose([
mt_transforms.Resample(0.25, 0.25),
mt_transforms.ElasticTransform(alpha_range=(40.0, 60.0),
sigma_range=(2.5, 4.0),
p=0.3),
mt_transforms.ToTensor()]
)
train_dataset = mt_datasets.MRI2DSegmentationDataset(filename_pairs,transform=train_transform)
dataloader = DataLoader(train_dataset, batch_size=2,collate_fn=mt_datasets.mt_collate)

What error do you get?
unfold is a method which should be called by or on a tensor. Based on your code snippet it looks like you are calling it on a file path (string).
Load the images, transform them to tensors, and then call unfold on them.

I apply it after transform, It worked. Thanks a lot.
I have another question, Is there any way for pairing label and without label images in 3D, like “MRI2DSegmentationDataset”?
“MRI2DSegmentationDataset” pair images in 2D, but I want to pair in 3D.
Or should i use the patches and transform them to 2D?
can I train my data by using patch in 3D?

I’m not sure what “pairing” means in this context.
If you want to work on a segmentation use case for 3D data, it should work in the same manner as for 2D data (just with an additional dimension).

It means to make a list of tuples in the format (input filename,ground truth filename).
I use “MRI3DSegmentationDataset” for this.
Thanks for your help.

Creating tuples using a filenames shouldn’t depend on the dimension property.
Would using this codebase as a starter work for your use case?

Yes, I use this,I change my code to the following code ,but it has error.
when I load my data and make pair input. I use “MRI3DvolumesegmentationDataset” and “MRI3DsegmentationDataset”.
first , I use “MRI3DvolumesegmentationDataset”, but it has error that Input shape of each dimension should be a multiple of length plus 2 * padding .
I don’t know what I do.
Is MRI3DvolumesegmentationDataset like to make patches?

ROOT_DIR= "/home/elahe/data/dataset/"
img_list = os.listdir(os.path.join(ROOT_DIR,'trainnii'))
label_list = os.listdir(os.path.join(ROOT_DIR,'labelsnii'))
filename_pairs = [(os.path.join(ROOT_DIR,'trainnii',x),os.path.join(ROOT_DIR,'labelsnii',y)) for x,y in zip(img_list,label_list)]
print(filename_pairs)

 
train_transform = transforms.Compose([
        mt_transforms.Resample(0.25, 0.25,0.25),
        mt_transforms.ToTensor()]
)

filename_pairs = mt_datasets.MRI3DSubVolumeSegmentationDataset(filename_pairs, cache=True,
                 transform=train_transform, canonical=False, length=(64,64,64), padding=0)

train_dataset = mt_datasets.MRI3DSegmentationDataset (filename_pairs,cache= True , transform=train_transform ,canonical= False)

Hello banikr,
Can I access to “get_paired_patch_3D” function in your code?
Thanks.

Hi ptrblck, what should I do, if I want to extract overlapping patches, for example image is 25625632, and patch is 323232, with a step size of 4? Do you might have found any example or tutorial?

unfold should work. Have a look at this post for an example.

Hey @Aliktk
There are different ways. Mostly overlapping and non-overlapping method.

def generate_patch_32_3(MR, Mask, cor, sag, axi):
    """
    :param MR: 3D MR volume
    :param Mask: 3D Mask same shape MR volume
    :param cor:
    :param sag:
    :param axi:
    :return: MR patch and corresponding Mask patch with shape[32,32,32] and [16,16,16]
    """
    # cor = 16
    hCor = np.int(cor/4)
    # sag = 64
    hSag = np.int(sag/4)
    # axi = 64
    hAxi = np.int(axi/4)
    qShape = [96, 128, 128]
    c = [0, MR.shape[0] - qShape[0]]
    s = [0, MR.shape[1] - qShape[1]]
    a = [0, MR.shape[2] - qShape[2]]
    nQuad = len(c) * len(s) * len(a)
    nPatch = np.int(nQuad * (qShape[0] / cor) * (qShape[1] / sag) * (qShape[2] / axi))
    # print(nPatch)
    MR_patch = np.zeros([nPatch, cor, sag, axi]).astype(np.float32)
    Mask_patch = np.zeros([nPatch, np.int(cor/2), np.int(sag/2), np.int(axi/2)]).astype(np.int)
    # print
    patch_count = 0
    quad = 0
    for x in c:
        for y in s:
            for z in a:
                MR_quad = MR[x:x + 96, y:y + 128, z:z + 128]
                Mask_quad = Mask[x:x + 96, y:y + 128, z:z + 128]
                quad += 1           
                for k in range(0, MR_quad.shape[0], cor):  # stops when final slice
                    for i in np.arange(0, MR_quad.shape[1], sag):
                        for j in np.arange(0, MR_quad.shape[2], axi):
                            patch = MR_quad[k:k + cor, i:i + sag, j:j + axi]
                            # std_patch.append(np.max(patch))
                            # print(patch.shape, 'here')
                            MR_patch[patch_count, :, :, :] = patch
                            # std_batch.append(np.max(MR_patch))
                            patch = Mask_quad[k+hCor:k + cor-hCor, i+hSag:i + sag-hSag, j+hAxi:j + axi-hAxi]
                            # print('\t', patch.shape, 'here')
                            Mask_patch[patch_count, :, :, :] = patch
                            patch_count += 1
    return MR_patch, Mask_patch, nPatch

Try the function here. You can avoid the Mask variable if you don’t have one. The function works with loaded MR volume as NIfTI(.nii) data. You can use nibabel python library for that.