Need serious Help on the bases of below explanation Please read once

Hello respected all forum community I am glad to ask question here in such a responsive environment :blush:
I am working on CNN based detection and classification of Alzheimer disease using MRI and Clinical data
For that I am using ADNI dataset where all the nifty MRI scans are not same shape. But the size of maximum number of files are (256,256,166).

I collected data from ADNI website and I cleaned it and splitted into testing and training folder manually with their respective classes for training my modal in future.
The data size is almost 17 GB

Now my task is to load and visualize all those .nii files from directory to the memory and convert them into tensor torch or numpy array. Both will be applicable in my case
I am following data loading topics and already asked two question for data loading to my memory.

I tried a lot my all tries find for exploring my problem here in Github My Github Files. It contains what I did in last two months but I still didn’t get the right direction.
I think here in this forum someone worked on 3D datasets like ADNI, OSIS etc. need help for solving dataloading problem and many question about dataloading.

Question are:


Loading 3D or 4D data in pytorch in not a big deal as they have strong libraries for computing this data. Beside dataloading tutorials anyone have such kinds of methods or code which load 3D data without any preprocessing and augmentation or subsampling??


Is it important to convert each MRI file into patches for feed into my memory and further processing?


What is the best to subsample my MRI files?


Loading MRI files need preprocessing?


Does slicing make my work easy?

My data dimension are 256, 256, 166 and it may varying for some images and the these files are in nii file format.

@nabsabs share with that for loading such data I simply do typically do is create a list of direct paths to the patient directories and then in the get_item method, I will index this list and directly read and convert the nifti to a torch Tensor.

but it didn’t work for me or i don’t know the code or something else but I failed to load the data in such way.:disappointed:. So I am requesting with much respect that please help me someone in step by step way please

I have many question but the problem is that here in my institute no one sharing and explaining these things. everyone says that you have to solve by your own knowledge and google any problem. But I couldn’t find way to figure out my problem

Experts and my field related please help me :pray:

I remember how hard it was when I started out in ML/PyTorch/Keras so here is some starter code. That being said, there is a wealth of information, both on this forum and online - I suggest you break up your questions into smaller parts to solve first by yourself, and then post your errors on the forum. It is generally not good practice to post your entire problem for the community to solve without efforts on your part (unless its an open research question).

Any pytorch dataset can be custom made following this boiler plate code below. You need to create:

  1. An __init__ method that will house all the variables you need to call in future methods (like maybe a list of patients as below).
  2. A __len__ method that will output the total size of your dataset, in this case its the number of patients you have in the folder
  3. And finally, the heart of the dataset, the __getitem__ method which the dataloader will call on everytime with an index which is a number from 0 to size_of_dataset (ie len(self.patients)).
  • This method takes the idx-th index from the list of self.patients
  • path_to_pat is a direct path to the patient you want to load in this iteration
  • Using nib, you can load the patient’s full volume into the mri variable.
  • At this point, everything is numpy and you can do normalization as you please and at the end, return the mri volume as one datasample.
  • Now, if your batch size is greater than 1, you have to make sure all your patient volumes are the same size in all dimensions otherwise you will not be able to batch them. (You can do this by chunking or resampling, your design choice)
import os
import nibabel as nib
from import Dataset, DataLoader

class ADNI(Dataset):

    def __init__(self, path_to_adni_files):      
        self.data_path = path_to_adni_files
        self.patients = os.listdir(self.data_path)
    def __len__(self):
        return len(self.patients)
    def __getitem__(self, idx):
        path_to_pat = os.path.join(self.data_path, self.patients[idx])
        mri = nib.load(path_to_pat).get_data()
        mri = (mri - mri.min())/(mri.max() - mri.min())
        mri = mri - mri.mean()       
        return mri

# example use
path_to_adni_files = "define_this_path"
dataset = ADNI(path_to_adni_files)
dataloader =  DataLoader(dataset, batch_size=1, shuffle=True)
one_batch_of_pats = next(iter(dataloader))


Thanks a lot.I’m also working as a beginner to deal with some MRI files through deep learning pytorch.particularly,I am trying to bulid a 3D Convolutional neural network to extract features and predict the gender labels.

Good to hear! In the __getitem__ method you should also load the label, as in your case, the gender label and return it as the output. You can load it to similar to how you load the patient data.

Thank you very much bro ! For me, my goal is to build a gender classification model from some existing MRI files to predict their gender labels. It’s just the first step to input the whole MRI files and their gender labels into pytorch as tensor. I don’t know how to build my classification model next, especially, I want to build a 3D convolution model:hot_face: