I have video data where each video is in the form of (30,224,224)
where 30 is the number of frames and each frames are grayscaled and have size of (224,224)
.
I want to use conv3d but the conv3d input is of the form (N,Cin,D,H,W)
.
here H,W = 224,224
N is the batch size.
what does Cin and D represent? and what must be their values in my case?
Cin is the number of channels in the image/frame (I assume 1 if they are grayscaled), D is the frame dimension (this dimension considers relationship among the frames in the input).
So your input should have the shape (N, 1, 30, 224, 224).