Problem using conv2d - wrong tensor input shape

You misunderstood channels with sizes. In Conv2d, you define input/output channel and kernel size and some arbitrary args like padding, not output size. Output size will be determined using kernel_size, stride, padding, etc.
https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html

An image in PyTorch has thre dimensions [channel, height, width]. So for a RGB image, [3, height, width]. If you want to get a 3 channel image as the result, you need to use a convolution that takes images with same channel size of your input which is 3, and 3 channels as the output, nn.Conv2d(3, 3, kernel_size) where kernel_size is the arbitrary size for filters.

I have explained a little about convs in this thread, I hope it helps. Also, there are alot of great demonstrations in youtube, etc. The only thing you need to be careful about is that PyTorch is channel first which means [channel, height, width] meanwhile numpy is [height, width, channel].