Hi, I have a greyscale video input that I’m trying to do a classification problem on. There are ~62 frames in the video and each individual frame is 64x64 pixels. I was wondering if there is any difference between using a conv2d with 62 channels and using a conv3d with 62 depth and 1 channel. Conv3d seems to make more sense in context but since it’s just 1 channel I’m wondering if it actually makes any difference in practice?
Thanks in advance for any help