Difficult to understanding about the filters and kernels?

Hi, Im newbiee to deep learning. I have came across the pytorch documentation about nn.functional.conv2d. I have a doubt that suppose my image is 10x10x3. If i have taken the kernel size as 3x3 with stride as 2 means the following operatons will happens on my image as follows:
A=10x10x1 * 3x3x1 (stride=2) # Feature Map 1
B=10x10x2 * 3x3x2 (stride=2) #Feature Map 2
C=10x10x3 * 3x3x3 (stride=2) #Feature Map 3

So the final feature map is addtion of A+B+C right. So here we are using the 3 channeled kernel because you image has three channels right. But now how can i intutively think about the filters like the above explanation…! I mean in the vgg16 architecture first block contains some thing like:

x = layers.Conv2D(
        64, (3, 3), activation="relu", padding="same", name="block1_conv1"
    )(img_input)

Im interested to know about what is that 64 and how can i intutively think like above for better understanding?

Thanks in advance

A filter is window of numbers which slides over the image and captures local features.
Each filter has a different set of weights , and the stride is like by how much pixels the window moves at a time.

For VGG Architecture , here 64 means , the number of different filters to apply on the image , which results in 64 different feature maps.
(3 , 3) is the kernel size , i.e the kernel will be a 3*3 window.

For intuition , think of it like , each of the 64 filters , can be used to express some characteristic pattern of the feature map , and the output is that particular feature representation.

Hope this clears your Doubt