I have a tensor shaped ([5, 1, 3, 126, 126]), which represents a video (5 frames each 126x126 rgb).
I need to forward it into a
self.resnet = nn.Sequential(
nn.Conv3d(5,5,1),
nn.UpsamplingBilinear2d(size=None, scale_factor=0.5)
)
but i get
RuntimeError: Given groups=1, weight of size [5, 5, 1, 1, 1], expected input[5, 1, 3, 126, 126] to have 5 channels, but got 1 channels instead
I think that I have probably misunderstood how the conv3d works but I can’t really understand why the expected dimensions are so different from the ones that my 5d tensor has at that moment