Help understanding 3D Convolution

(Manuel Alejandro Diaz Zapata) #1

I have a image composed of M channels of H height and W width and I want to apply a channel-wise convolution, so I thought of using the Conv3d class. Currently, my image has shape (M, H, W)

But in the docs they specify that the input must be (N, Cin, D, H, W). What I know is that N is the minibatch size, H is the height and W is the width. But I am getting confused about Cin and D.

From what I understand, Cin is the number of channels of the image, but what does D mean? To do the convolution on my image should I pass it like (N, 1, M, H, W) or like (N, M, 1, H, W)?

Thanks!

(Juan F Montesinos) #2

3D convolutions are supposed to deal with temporal structures, in short, a video.
D is in this case is something amount of images

1 Like
(Manuel Alejandro Diaz Zapata) #3

Thanks man! Seems like I’ll be using Conv2D then.