Problem using conv2d - wrong tensor input shape

I need to forward a tensor [1, 3, 128, 128] representing a 128x128 rgb image into a


but I get the error

RuntimeError: Given groups=1, weight of size [128, 128, 1, 1], expected input[1, 3, 128, 128] to have 128 channels, but got 3 channels instead

PS: the image tensor is shaped as a 4d because the previous step is a nn.UpsamplingBilinear2d(size=None, scale_factor=2) which needs a 4d tensor as input


There is no way to use such a convolution as input channels does not match. Or do you mean the resulting output should be in size [1, 128, 128]?

Yes, the output should be an image, thats why i set the parameters in_channel and out_channel as 128, but I can’t understand why the expected input is basically inverted.
Edit: using (3,128,1) fixed the problem but should i change also out_channels to 3 if i need a 128x128 rgb image or is it correct to leave it as 128?
I’m not sure that I have correctly understood how conv2d works

You misunderstood channels with sizes. In Conv2d, you define input/output channel and kernel size and some arbitrary args like padding, not output size. Output size will be determined using kernel_size, stride, padding, etc.

An image in PyTorch has thre dimensions [channel, height, width]. So for a RGB image, [3, height, width]. If you want to get a 3 channel image as the result, you need to use a convolution that takes images with same channel size of your input which is 3, and 3 channels as the output, nn.Conv2d(3, 3, kernel_size) where kernel_size is the arbitrary size for filters.

I have explained a little about convs in this thread, I hope it helps. Also, there are alot of great demonstrations in youtube, etc. The only thing you need to be careful about is that PyTorch is channel first which means [channel, height, width] meanwhile numpy is [height, width, channel].