How does conv2d actually work from inside regarding channel size of filters?

i have a small question that i actually didn’t think about much until after some time after finishing a small project and reflecting on the code -obv, i am still a beginner-
let’s say we have

L1=nn.conv2d(in_channels=3,output_channels=10,kernel=.. etc...)

so basically what this does it preforms convolution2D on the input image which has 3 Channels and will have 10 channels of depth as an output.
what this means is this layer is preforming (10) filters on the given image.

my simple question, does each filter have 3 channels from inside to apply on the image?

to put it in an other way
L1=nn.conv2d(in_channels=512,output_channels=1,kernel=.. etc...)
so this means the input has 512 channels and we are preforming an amazing convolution haha to make it have 1 channel only, so we are applying one filter.
so does this mean that this filter has 512 layers in it to match the input?

1 Like


Yes exactly, that is why in kernel size you just provide (h, w) not channel size of it, because it has to match in_channels of conv2d layer. And again you are right, output_channels is the number of filters with size of (in_channels, h, w) which will be stacked together at the end.