I have a image composed of M channels of H height and W width and I want to apply a channel-wise convolution, so I thought of using the Conv3d class. Currently, my image has shape (M, H, W)
But in the docs they specify that the input must be (N, Cin, D, H, W). What I know is that N is the minibatch size, H is the height and W is the width. But I am getting confused about Cin and D.
From what I understand, Cin is the number of channels of the image, but what does D mean? To do the convolution on my image should I pass it like (N, 1, M, H, W) or like (N, M, 1, H, W)?