Hi, Im newbiee to deep learning. I have came across the pytorch documentation about nn.functional.conv2d. I have a doubt that suppose my image is 10x10x3. If i have taken the kernel size as 3x3 with stride as 2 means the following operatons will happens on my image as follows:
A=10x10x1 * 3x3x1 (stride=2) # Feature Map 1
B=10x10x2 * 3x3x2 (stride=2) #Feature Map 2
C=10x10x3 * 3x3x3 (stride=2) #Feature Map 3
So the final feature map is addtion of A+B+C right. So here we are using the 3 channeled kernel because you image has three channels right. But now how can i intutively think about the filters like the above explanation…! I mean in the vgg16 architecture first block contains some thing like:
x = layers.Conv2D(
64, (3, 3), activation="relu", padding="same", name="block1_conv1"
)(img_input)
Im interested to know about what is that 64 and how can i intutively think like above for better understanding?
Thanks in advance