Using Conv3d with Dataset that has different depths between volumes

I am working with Medical Images, where I have 130 Patient Volumes, each volume consists of N number of DICOM Images/slices.

The problem is that between the volumes the the number of slices N , varies.

Majority, 50% of volumes have 20 Slices, rest varies by 3 or 4 slices, some even more than 10 slices (so much so that interpolation to make number of slices equal between volumes is not possible)

I am able to use Conv3d for volumes where the depth N (number of slices) is same between volumes, but I have to make use of entire data set for the classification task. So how do I incorporate entire dataset and feed it to my network model ?

Similar to varying spatial sizes in images, you could try to resize the DICOM volume.
However, since you are dealing with medical images, the resampling might not be straightforward.
@christianperone might have a good idea as the author of MedicalTorch. I couldn’t find a resampling method for the z-dimension, but I might have missed it while skimming through the code.

The last time I had to resample a DICOM image I used ITK for it, but I assume there should be easier methods by now. :slight_smile:


So, there is not a single answer to that as there are many approaches. But you’ll probably want first to resample (offline, not during mini-batch sampling) to a common voxel space, which is a very important step to make sure you have the same physical meaning across axes for different volumes. Now, what people usually do when resampling is not an option, where you have very different volume sizes, is the same that is done on 2D, you do inference and training using patches, which when extended to 3D are cubes. That is also the same reason why not a lot of people use 3D, because it is very cumbersome to work with. Everything I said is usually done for segmentation, but as it seems, you’re doing a classification task, and for that you’ll have more options, such as using final layers that are independent of the spatial size, such as global average pooling, etc. You can also add padding in the other images to avoid issues with the batching, etc. Anyway, there are many approaches and all these approaches will highly depend on your application domain because differently than training with natural images, medical imaging have a lot of peculiarities depending if it is CT, MRI, etc. I would consider converting the DICOM to NIfTI as well as they are much easier to work with. Hope it helped a little, but again, you should consider your application domain when taking these decisions.