Input for conv3d

arjun_pukale · September 22, 2020, 10:49am

I have video data where each video is in the form of (30,224,224) where 30 is the number of frames and each frames are grayscaled and have size of (224,224) .
I want to use conv3d but the conv3d input is of the form (N,Cin,D,H,W).
here H,W = 224,224
N is the batch size.
what does Cin and D represent? and what must be their values in my case?

mariosasko · September 22, 2020, 12:01pm

Cin is the number of channels in the image/frame (I assume 1 if they are grayscaled), D is the frame dimension (this dimension considers relationship among the frames in the input).

So your input should have the shape (N, 1, 30, 224, 224).