So, there is not a single answer to that as there are many approaches. But you’ll probably want first to resample (offline, not during mini-batch sampling) to a common voxel space, which is a very important step to make sure you have the same physical meaning across axes for different volumes. Now, what people usually do when resampling is not an option, where you have very different volume sizes, is the same that is done on 2D, you do inference and training using patches, which when extended to 3D are cubes. That is also the same reason why not a lot of people use 3D, because it is very cumbersome to work with. Everything I said is usually done for segmentation, but as it seems, you’re doing a classification task, and for that you’ll have more options, such as using final layers that are independent of the spatial size, such as global average pooling, etc. You can also add padding in the other images to avoid issues with the batching, etc. Anyway, there are many approaches and all these approaches will highly depend on your application domain because differently than training with natural images, medical imaging have a lot of peculiarities depending if it is CT, MRI, etc. I would consider converting the DICOM to NIfTI as well as they are much easier to work with. Hope it helped a little, but again, you should consider your application domain when taking these decisions.