Convolution with a stack of images

Hi everyone!
I have a stack of medical images (CT scans) that are labeled as a bunch (malignant or benign). I would like to train a model to predict whether a patient’s tumor is malignant or benign. Here is a list of issues I can’t find solutions for:

  1. Loading the data: Is it necessary to use a dataloader? All the tutorials I’ve seen seem to be using them. Is an iterable list of the stack of images and their labels sufficient?

  2. Variable number of images: As I mentioned, the number of images in each stack is different. How do I overcome this? Is there a way to create a model to work around this issue without grouping/padding?

  3. conv2d or conv3d: I can’t seem to find any information online about conv3d being used for anything other than colour images with 3 channels. My images are grayscale so do I use conv2d or treat the number of images in the stack as the number of channels? I’m also struggling with finding good resources to understand how conv3d works so any links would be great.

Any and all help is much appreciated! Thanks :grinning:

I’m sorry if I’ve asked for too much in a single post. New to this branch of computer science so please go easy :sweat_smile:

Hi, did you managed to solve this problem? I have the same questions.

Hope to hear from you soon.