Input for conv3d

I have video data where each video is in the form of (30,224,224) where 30 is the number of frames and each frames are grayscaled and have size of (224,224) .
I want to use conv3d but the conv3d input is of the form (N,Cin​,D,H,W).
here H,W = 224,224
N is the batch size.
what does Cin and D represent? and what must be their values in my case?

Cin is the number of channels in the image/frame (I assume 1 if they are grayscaled), D is the frame dimension (this dimension considers relationship among the frames in the input).

So your input should have the shape (N, 1, 30, 224, 224).