Can a bunch of frames be an input to a neural network?

Hello, I am trying to predict whether a brain tumor has a specific genetic variant (MGMT promoter) from the MRI scans of the brain. I will try to be explicit and give enough context on the problem. There are 585 patients in the training dataset, each containing 4 folders of different MRI modes: T1w, T2w, T1wCE and FLAIR. Inside each of the 4 folders, there are hundreds of dicom files. Also, each patient is assigned a target value of either 0 (brain tumor doesn’t have MGMT promoter) or 1 (there is MGMT promoter in a tumor).

I have written a custom Dataset for data handling but I am not sure what should be the input to a neural network. Specifically, I would like to treat a group of dicoms from a single mode (for example, FLAIR) as a single input, but then the shape of the resulting numpt array is (number_of_dicoms, 224,224) as the images are grayscaled. Is it possible to feed a tensor of such shape to a neural network, or should I feed each dicom separately (e.g. 224x224 tensor)?

Have you look at 3D convolutional neural networks, there are multiple notebooks in this Kaggle competition that show how this can be done. There are also multiple discussion posts on the same

1 Like