Help understanding 3D Convolution

Manuel_Alejandro_Dia · May 15, 2019, 7:57am

I have a image composed of M channels of H height and W width and I want to apply a channel-wise convolution, so I thought of using the Conv3d class. Currently, my image has shape (M, H, W)

But in the docs they specify that the input must be (N, Cin, D, H, W). What I know is that N is the minibatch size, H is the height and W is the width. But I am getting confused about Cin and D.

From what I understand, Cin is the number of channels of the image, but what does D mean? To do the convolution on my image should I pass it like (N, 1, M, H, W) or like (N, M, 1, H, W)?

Thanks!

JuanFMontesinos · May 15, 2019, 10:09am

3D convolutions are supposed to deal with temporal structures, in short, a video.
D is in this case is something amount of images

Manuel_Alejandro_Dia · May 15, 2019, 1:08pm

Thanks man! Seems like I’ll be using Conv2D then.

ahkarami · December 1, 2019, 9:16pm

@Manuel_Alejandro_Dia & @JuanFMontesinos
There is a great reference for understanding Different kinds of Convolution Operators (3D Convolution, Spatially separable convolution, depthwise separable convolution, etc.):